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1.  Introduction 

A  major  challenge  in  prostate  cancer  (PCa)  research  is  to  distinguish  aggressive  from  indolent 
disease.  Although  the  D’Amico  risk  stratification  is  helpful  and  widely  used  to  guide  PCa 
treatment,  it  relies  on  a  few  standard  clinical  parameters  (prostate  specific  antigen  (PSA),  stage, 
and  grade)  and  cannot  always  reliably  distinguish  patients  who  will  die  from  PCa  from  those 
who  do  not,  leading  to  over-treatment  and  unnecessary  side  effects  in  many  men  with  “low-risk” 
disease,  preventing  PCa-specific  mortality  only  in  a  small  minority.  On  the  other  hand,  some 
cancers  may  be  destined  to  recur  despite  aggressive  multi-modality  therapy.  There  is  an  urgent 
need  for  additional  biologically  relevant  markers  to  improve  prognostication  beyond  algorithms 
based  solely  on  PSA,  stage,  and  grade.  Ideally,  such  biomarkers  could  also  provide  clinical 
guidance  for  alternative  or  novel  treatments.  Our  prospective  studies  of  adiposity,  physical 
activity,  and  several  individual  biomarkers  in  two  large  Harvard  cohorts  demonstrate  that 
markers  of  energy  metabolism  such  as  insulin,  adipokines,  and  de  novel  fatty  acid  synthesis  may 
play  important  roles  in  risk  of  lethal  PCa.  Recent  development  of  a  metabolite  profiling  platfonn 
by  Dr.  Clish’s  laboratory  at  the  Broad  Institute  of  MIT/Harvard  further  showed  promising 
potential  along  this  line  of  research.  This  technology  has  identified  in  vitro  an  aberrant  activation 
of  the  PI3K  downstream  target  as  a  common  molecular  event  in  cancer  pathology  and  obesity; 
revealed  significant  associations  of  several  amino  acids  and  lipid  metabolites  in  human  plasma  as 
signatures  of  insulin  resistance  and  diabetes  risk;  and  identified  signatures  of  exercise 
perfonnance  and  cardiovascular  disease  susceptibility,  proving  its  validity  of  metabolic  profiling. 
In  addition,  the  methods  have  also  passed  our  own  rigorous  reproducibility  assessments.  All 
these  provide  important  ground  work  for  the  current  proposal.  The  current  study  is  using  a 
targeted,  LC -MS-based  metabolite  profiling  platform  to  measure  and  compare  metabolic  profiles 
of  prediagnostic  blood  samples  collected  from  men  subsequently  diagnosed  with  PC  and  sample 
of  men  who  remained  cancer-free  in  Physicians’  Health  Study  (PHS).  And  test  whether  these 
relationships  are  independent  of  the  known  metabolic  risk  factors  (overweight/obese,  insulin 
marker  C-peptide,  insulin-like  growth  factor  I  (IGF-I),  IGF  binding  protein  3,  (IGFBP-3),  and 
adiponectin)  as  well  as  the  clinical  characteristics  defined  as  the  D’Amico  risk. 


2.  Keywords 

Prostate  cancer  survivorship,  metabolomic  profiling,  metabolic  biomarkers 


3.  Accomplishments 

What  were  the  major  goals  of  the  project? 

The  three  original  aims  were: 

Aim  1 :  Explore  and  validate  the  metabolomic  footprints  for  normal  controls  (n=50  x  2  cohorts) 
vs.  three  groups  of  cases  (metastatic  PC  at  diagnosis,  initially  localized  PC  and  long-time 
survivors,  and  initially  localized  PC  but  died  of  PC;  n=50  in  each  of  the  three  groups,  total  n=150 
cases  x  2  cohorts); 

Aim  2:  Among  men  with  initial  localized  PCa,  explore  and  validate  the  metabolomic  footprints 
for  long-term  survivors  vs.  men  who  subsequently  died  of  PCa; 

Aim  3:  Test  and  validate  whether  these  associations  are  independent  of  the  known  metabolic  risk 
factors  (overweight/obese,  insulin  marker  C-peptide,  insulin-like  growth  factor  I  (IGF-I),  IGF 
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binding  protein  3  (IGFBP-3),  and  adiponectin),  as  well  as  the  clinical  characteristics  defined  as 
the  D’Amico  risk. 

In  the  original  protocol,  we  plan  to  measure  samples  of  PCa  cases  from  both  HPFS  and  PHS. 
However,  the  HPFS  team  has  received  separate  grant  for  metabolomics  measurement.  Therefore, 
we  amended  our  study  population  to  exclude  HPFS  data,  instead  we  use  the  fund  to  increase  the 
sample  size  from  50  to  100  for  each  of  the  proposed  aims  so  that  the  total  sample  size  remain 
unchanged. 


Modified  aims: 

Aim  1:  la.  Compare  to  100  healthy  men  without  cancer  (at  least  at  the  time  when  the  cases  were 
diagnosed),  metabolomic  profiling  for  100  men  with  "high  risk"  (Tl-3  and  Gleason  8+)  or 
metastasis  at  diagnosis;  lb.  Among  men  with  "high  risk"  (Tl-3  and  Gleason  8+)  or  metastasis  at 
diagnosis,  metabolomic  profiling  between  men  who  died  of  the  cancer  vs.  those  who  were  still 
alive  by  2012; 

Aim  2:  Compare  the  metabolomic  profiles  between  100  men  with  "low-intermediate  risk"  (Tl-3 
and  Gleason  2-7)  PCa  and  died  of  the  cancer  with  those  (n=100)  who  survived  at  least  10  years 
after  diagnosis. 

Aim  3:  Test  whether  these  associations  are  independent  of  the  known  metabolic  risk  factors 
(overweight/obese,  insulin  marker  C-peptide,  insulin-like  growth  factor  I  (IGF-I),  IGF  binding 
protein  3  (IGFBP-3),  and  adiponectin),  as  well  as  the  clinical  characteristics  defined  as  the 
D'Amico  risk. 

What  was  accomplished  under  these  goals? 

Based  on  the  available  samples,  we  utilized  matched  case  control  design  to  select  the  blood 
samples  to  be  measured.  The  selected  blood  samples  are  currently  being  analyzed  in  the  lab  and 
we  expect  to  receive  the  lab  data  by  early  2015.  While  we  are  waiting  for  the  lab  results,  we  have 
been  working  on  analyzing  the  related  metabolic  biomarkers  and  writing  up  manuscripts. 

I. Sample  selection 

Based  on  the  available  samples,  we  utilized  matched  case  control  design  to  select  the  blood 
samples  to  be  measured.  The  following  part  showed  the  matching  method  for  each  specific  aim. 

Aim  1:  Study  Population: 

1)  All  population  in  the  cohort; 

2)  Blood  volume  >100  ml; 

Cases: 

1)  Incidence  PCa  cases  2)  Localized  and  high  grade  cases  or  mets  at  diagnosis;  3)  status 
in  mortality  file:  died  of  PCa  or  alive; 

Control  Matching  criteria: 

1)  Same  age  group  at  baseline:  40-<50  50-<60  6-<70  70+years 

2)  Same  fasting  >=8,  0<8  hrs 

3)  Controls  have  no  cancer,  or  have  cancer,  but  diagnosed  after  the  last  PCa  diagnosis 
date  in  the  age  and  fasting  group. (  we  only  have  cancer  infonnation,  no  other  disease 
info) 
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4)  Frequency  matching:  get  same  percentile  at  each  age  group  and  total  controls  are  100. 

Program:  SAS  proc  surveyselect 
Aim2:  Study  Population: 

1)  Incidence  PCa  cases  with  blood  collected  in  1982; 

2)  Localized  and  low  grade  cases; 

3)  Blood  volume  >100  ml; 

Cases: 

1)  Status  in  mortality  file:  l)PCa  death;  2)  Survived  less  than  lOyears 

Control  Matching  criteria: 

1)  Same  age  at  diagnosis 

2)  Same  fasting  >=8,  0<8  hrs 

3)  Same  Gleason  category:  2-6,7 

4)  Controls  are  alive  now,  and  alive  more  than  10  years. 

5)  1 : 1  match",  program:  proc  sql  and  hash  table 

In  summary,  for  aim  1,  we  selected  100  cases  who  have  localized,  high  grade  PCa  (clinical 
stage  T1-T3,  Gleason  grade  8-10)  or  who  have  metastatic  PCa  (clinical  stage  T4N1M1 
, Gleason  grade  2-10).  We  matched  100  controls  from  participants  in  PHS  who  were  cancer 
free  or  have  cancer,  but  diagnosed  after  the  same  group  of  cases.  For  aim  2,  48  eligible  cases 
were  identified  from  PHS,  those  cases  have  localized  low  grade  (clinical  stage  T1-T3,  Gleason 
grade  2-7)  PCa  who  died  of  PCa  within  10  years  after  diagnosis.  Among  them,  43  got  matched 
with  controls,  who  have  localized  low  grade  (clinical  stage  T1-T3,  Gleason  grade  2-7)  PCa, 
alive  at  the  end  of  this  study  or  who  have  been  alive  more  for  than  10  years  (Table  1). 


Table  1:  Final  Result  of  sample  selection 


Classification  (N) 

Sample  size 

Matched  controls 

Aim  1  (All  Population) 

Localized  T1-T3  & 
Gleason  8-10 

Died  of  PCa 

43 

Alive 

57 

Metastatic  PCa 

(T4N1M1) 

Died  of  PCa 

51 

Alive 

30 

Total 

181 

Healthy  controls 

100 

Aim  2  (Incident  PCa  with  82  blood) 

Localized  T1-T3  & 
Gleason  2-6  ,7 

died  of  PCa 

48 

Long-term  survivor  10  yr+  controls 

487 

43 

Following  this  matching  procedure,  we  selected  a  total  of  372  (Aiml:  181  cases  and  100 
matched  controls;  Aim  2:  48  cases  and  43  matched  controls)  blood  samples  for  analyses.  28  QC 
samples  were  also  included.  After  checking  with  the  blood  lab  staff,  we  have  only 
successfully  identified  329  out  of  372  eligible  samples,  and  40  quality  control  samples 
from  lab.  We  then  decided  to  add  31  additional  samples  from  PCa  patients  who  have  been  died 
from  other  cancers.  We  sent  a  final  number  of  400  samples  to  Dr.  Clish’s  lab  at  November,  2014. 
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Power  consideration: 

The  data  generated  in  Aims  la,  lb,  and  2  are  all  from  matched/paired  design.  Hence,  we  use  the 
same  power  calculation  fonnula  (Cohen,  1988)  to  estimate  the  powers  given  the  sample  size  and 
type  I  error  rate.  We  set  the  type  I  error  rate  as  0.05.  The  differences  among  the  3  aims  are  the 
effect  size,  which  is  unknown  until  we  obtain  the  data.  Hence,  we  tried  a  sequence  value  of  effect 
size.  In  our  original  plan,  we  have  n=50  subjects  for  each  of  the  2  groups.  In  the  current  plan,  we 
have  n=100  subjects  per  group.  The  power  curves  shown  in  the  following  Figure  shows  that  the 
power  is  significantly  improved  when  the  sample  size  is  increased  from  n=50  per  group  to  n=100 
subjects  per  group.  For  n=50  per  group,  as  long  as  the  effect  size  >0.41,  the  power  would  be  >  0.8. 
For  n=50  per  group,  as  long  as  the  effect  size  >0.41,  the  power  would  be  >  0.8.  For  n=100  per 
group,  as  long  as  the  effect  size  >  0.29,  the  power  would  be  >  0.8. 

Comparing  powers  between  nPerGroup=50  and  nPerGroup=100 


Some  concerns  regarding  the  Pi’s  access  to  the  HPFS  data  for  validation: 

Because  the  HPFS  have  separate  funding  to  measure  metabolomics  for  their  PCa  cases  and 
controls,  we  decided  to  focus  this  grant  solely  on  the  PHS  to  avoid  redundant  work  but  we  will 
validate  the  model  with  the  HPFS  data  even  though  the  DOD  is  not  paying  for  the  analysis.  To 
reassuring  the  DOD  that  the  PI  could  do  so,  we  have  a  letter  of  agreement  from  Dr.  Mucci,  the  co¬ 
leader  of  the  HPFS  SPORE  project,  to  confirm  that  the  PI  Dr.  Ma  will  be  granted  access  to  the 
HPFS  data  so  that  we  can  complete  the  validation. 

II.  Completed/ongoing  studies  &  results 

a. Insulin-like  growth  factor  (IGF)  pathway  genetic  polymorphisms,  circulating  IGF1  and 
IGFBP3  levels  and  prostate  cancer  survival 

We  conducted  kernel  machine  pathway  analysis  to  evaluate  whether  530  tagging  single¬ 
nucleotide  polymorphism  (SNP)  in  26  IGF  pathway-related  genes  were  collectively  associated 
with  prostate  cancer  mortality  among  5,887  prostate  cancer  patients  (704  prostate  cancer  deaths) 
from  7  cohorts  in  the  NCI  Breast  and  Prostate  Cancer  Cohort  Consortium  (BPCa3). 

IGF  signaling  pathway  was  associated  with  prostate  cancer  mortality  (P=0.03),  and  SNP  sets  of 
IGF2-AS  and  SSTR2  were  the  main  contributors  (both  P=0.04)  (Table  5).  In  SNP-specific 
analysis,  36  SNPs  were  associated  with  prostate  cancer  mortality  with  Ptrend<0.05  but  only  3 
SNPs  in  the  IGF2-AS  remained  significant  after  gene-based  corrections.  Two  of  the  three  SNPs 
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were  in  perfect  linkage  disequilibrium  (r  =1  for  rs  1004446  and  rs374 12 11)  whereas  the  third 
rs4366464  was  independent  (r"=0.03).  The  hazard  ratios  (HRs)  per  each  additional  risk  allele 
were  1.19  (95%  Cl  1.06-1.34;  Ptrend=0.003)  forrs3741211  and  1.44  (1.20-1.73;  Ptrend=0.0001)  for 
rs4366464.  Rs4366464  remained  significant  after  correction  for  all  the  SNPs  tested 
(P trend. corr  0 .  04,  Meff=424).  Pre-diagnostic  circulating  levels  of  IGF  1  (HRhighest  vs  lowest  quartile  0.71, 
95%CI  0.48-1.04)  and  IGFBP3  (HR  0.93;  95%CI  0.65-1.34)  were  not  associated  with  prostate 
cancer  mortality. 

The  manuscript  has  been  published  by  JNCI  (June,  2014). 

b. Characterization  of  energy-related  biomarkers  measured  before  and  after  PCa  diagnosis 
in  predicting  all-cause  and  PCa-specific  mortality. 

In  the  PHS,  we  defined  “high  energetic  risk”  as  BMI>25  kg/m2  and  elevated  C-peptide  levels  (in 
the  highest  quartile).  We  found  that  this  “energetic  risk”  significantly  predicted  PCa  mortality 
among  men  with  localized  disease  at  diagnosis  independent  of  clinical  characteristics.  We 
replicated  this  association  in  an  independent  cohort,  the  Health  Professionals  Follow-up  Study 
(HPFS). 

In  both  cohorts,  we  found  that  incorporating  this  “energetic  risk”  to  the  D’Amico  risk  score 
(defined  by  three  clinical  perimeters:  PSA,  clinical  stage,  and  Gleason  score)  significantly 
improved  the  predictability  of  PCa-specific  mortality  and  all-cause  mortality  in  men  with  initial 
diagnosis  of  localized  cancer;  the  C-statistic  for  PCa-specific  mortality  was  improved  from  0.72 
to  0.78  ( P<0.001 ).  Moreover,  “energetic  risk”  identified  ~20%  of  patients  who  are  at  high  risk  of 
disease  specific  mortality  but  are  classified  as  low  risk  according  to  clinical  characteristics.  The 
resulting  paper  is  undergoing  peer-review.  One  major  concern  raised  was  the  potential 
confounding  by  comorbidity  and  treatments.  We  therefore  carefully  evaluated  the  impact  of 
these  two  factors  from  both  cohorts  and  found  little  changes  of  the  overall  results. 

c.  Pre-diagnostic  Obesity,  Smoking  and  PCa  survival 

Although  obesity  and  smoking  has  not  been  strongly  associated  with  prostate  cancer  (PCa) 
incidence,  merging  evidence  linked  them  to  increased  PCa-specific  mortality.we  investigated  the 
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associations  of  pre-diagnostic  BMI  and  smoking  status  with  risk  of  progression  from  time  of  PCa 
diagnosis  to  fatal  outcome  among  10,106  PCa  patients  from  the  NCI  Breast  and  Prostate  Cancer 
Cohort  Consortium  (BPC3). 


Figure  1 .  Age-adjusted  cumulative  incidences  of  a)  PCa-specific  mortality;  and  b)  Total  mortality  stratified  by  BMI 
categories  and  smoking  status  among  PCa  men  from  BPC3  study 


Cumulative  incidence  curves  show  the  probability  of  prostate  cancer-specific  mortality  or  total  mortality  after 
diagnosis  according  to  baseline  smoking  and  BMI  categories,  controlling  for  age  at  diagnosis. 

The  cumulative  PCa-specific  and  overall  mortality  was  much  higher  in  current  smokers  as 
compared  with  never  or  former  smokers.  In  contrast,  the  difference  according  to  BMI  categories 
among  non-current  smokers  is  much  smaller,  but  still  apparent  for  total  mortality.  This  study 
provides  further  evidence  that  overweight/obesity  and  smoking  history  prior  to  diagnosis  are 
related  to  poor  survival  among  patients  with  PCa.  The  manuscript  has  been  developed  and  is  now 
circulating  among  coauthors. 

d.Type  2  Diabetes  before  and  after  PCa  diagnosis  with  PCa-specific  and  all-cause  mortality. 

Utilizing  the  same  cohorts  data  from  BPC3,  we  also  observed  that  New  T2D  cases  after  PCa 
diagnosis  was  linked  to  improved  survival  among  PCa  cases  (Table  2).  We  plan  to  look  into  this 
together  with  T2D  related  SNPs  and  C-peptide  information. 


Table  2:  Diabetes  status  and  prostate  cancer/other  mortality  in  the  BPCa3  cohort 


Variable 

Prostate  cancer  specific  mortality 

Other  mortality 

HR 

95%CI 

p  value 

HR 

95%CI 

p  value 

Never 

Ref 

Ref 

Ref 

Diabetes  Before  PCa  diangosis 
status 

1.02 

0.95 

1.09 

0.583 

1.25 

1.08 

1.46 

0.004 

After  PCa  diagnosis 

0.72 

0.61 

0.84 

<.001 

0.62 

0.41 

0.93 

0.021 

Multivariate  model  adjusted  for  age  at  diagnosis  (continuous),  smoking  status  (never,  former,  current),  BMI  (18- 
22.9,  23-24.9,  25-27.9,  28-29.9,  30-34.9  kg/m2),  drinking  status  (never,  <15g/day,>  15&<30g/day,  >30g/day) , 
diabetes  status  (never,  baseline,  new),  cohort  (ATBC,  CPS2,  EPIC,  HPFS,  MCCS,  MEC,  PHS,  PLCO),  duration 
between  baseline  and  PCa  diagnosis  (continuous); 


e.  GWAS-identified  type  2  diabetes  SNPs  and  risk  of  progression  to  fatal  prostate  cancer 
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This  study  will  be  based  on  our  recently  published  study  using  the  BPCa3  genome-wide 
association  study  of  2,782  advanced  PCa  cases  and  4,458  controls  to  evaluate  whether  36  T2D 
susceptibility  loci  and  PCa  incident  risk  (M Machiela  et  al.  Am  J Epidemiol  2012).  Ten  T2D 
markers  near  9  loci  (NOTCH2,  ADCY5,  JAZF1,  CDKN2A/B,  TCF7L2,  KCNQ1,  MTNR1B, 
FTO,  and  HNF1B)  were  nominally  associated  with  PCa  risk  (P  <  0.05);  the  association  for 
rs757210  at  the  HNF1B  locus  was  significant  when  multiple  comparisons  were  accounted  for 
(adjusted  P  =  0.001).  Genetic  risk  scores  weighted  by  the  T2D  log  odds  ratio  and  multilocus 
kernel  tests  also  indicated  a  significant  relation  between  T2D  variants  and  PCa  risk.  These  T2D 
risk  variants  have  not  been  fully  investigated  for  PCa  progression  to  fatal  outcome.  Also,  few 
studies  have  T2D  phenotypes  or  sufficient  power  to  assess  whether  T2D  status  mediates  the 
relationship  between  T2D  risk  variants  and  PCa  risk.  We  will  fully  evaluate  these  genes  and 
mediation  through  or  interaction  with  T2D  for  fatal  PCa.  We  will  also  explore  the  association 
between  T2D  risk  variants  and  risk  of  specific  type  of  PCa  cases  (advanced  PCa  and  died, 
advanced  PCa  and  long  term  survivors  10+,  localized  PCa  cases)  For  our  analysis,  all  36  T2D 
SNPs  have  been  pulled  out  for  7,240  participants  including  both  cases  and  controls  from  the 
imputed  data  files  for  BPCa3  Adv  Prostate  Cancer  GWAS.  The  imputation  was  done  using 
HapMap  2  Rel  22  CEU  phased  data  as  the  reference  panel. 

f.  Elevation  of  circulating  branched-chain  amino  acids  is  an  early  event  in  human 
pancreatic  adenocarcinoma  development. 

The  pancreatic  cancer  group  from  Danna  Farber  Cancer  Institute  has  also  been  working  on  the 
metabolomics  of  pancreatic  cancer  development.  We  are  also  working  closely  with  this  group 
on  metabolomics  data  analysis. 

This  study  utilized  profiled  metabolites  in  prediagnostic  plasma  from  individuals  with  pancreatic 
cancer  (cases)  and  matched  controls  from  four  prospective  cohort  studies.  And  find  that  elevated 
plasma  levels  of  branched-chain  amino  acids  (BCAAs)  are  associated  with  a  greater  than  twofold 
increased  risk  of  future  pancreatic  cancer  diagnosis.  This  elevated  risk  was  independent  of 
known  predisposing  factors,  with  the  strongest  association  observed  among  subjects  with 
samples  collected  2  to  5  years  before  diagnosis,  when  occult  disease  is  probably  present.  We 
show  that  plasma  BCAAs  are  also  elevated  in  mice  with  early-stage  pancreatic  cancers  driven  by 
mutant  Kras  expression  but  not  in  mice  with  Kras- driven  tumors  in  other  tissues,  and  that 
breakdown  of  tissue  protein  accounts  for  the  increase  in  plasma  BCAAs  that  accompanies  early- 
stage  disease.  Together,  these  findings  suggest  that  increased  whole-body  protein  breakdown  is 
an  early  event  in  development  of  pancreatic  ductal  adenocarcinoma  (PDAC). 

This  manuscript  has  been  published  in  Nature  Medicine  (Sep,  2014). 

What  opportunities  for  training  and  professional  development  has  the  project  provided? 

This  provided  has  provided  funding  and  research  opportunities  for  several  doctoral  and  post¬ 
doctoral  students  from  Harvard  T.H.  Chan  School  of  Public  Health. 

Yin  Cao,  graduated  from  the  doctoral  of  science  program  from  Epidemiology  department,  and  a 
current  post-doc  student  at  Nutrition  Department.  One  of  her  thesis  paper  was  based  on  and 
supported  by  the  current  project. 
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Changzheng  Yuan,  doctoral  candidate  at  Nutrition  and  Epidemiology  Department.  She  is  now 
working  on  three  research  topics  related  to  this  project,  mainly  focusing  on  obesity,  T2DM  and 
genetic  variants  related  to  prostate  cancer  development. 

Meng  Yang,  postdoc  fellow  at  Nutrition  Department.  She  currently  working  on  studying  the 
BMI  trajectory,  dietary  factors,  metabolic  biomarkers  and  prostate  cancer  survivorship. 

CY  and  MY  also  work  closely  with  the  project  leader  and  statisticians  to  discuss  the  study  design 
and  sample  selections. 

How  were  the  results  disseminated  to  communities  of  interest? 

Nothing  yet  to  report. 

What  do  you  plan  to  do  during  the  next  reporting  period  to  accomplish  the  goals? 

We  plan  to  conduct  the  analyses  based  on  the  proposed  aims  of  metabolomic  analysis  after 
receiving  the  lab  analyses  results. 


4.  Impact 

What  was  the  impact  on  the  development  of  the  principal  discipline(s)  of  the  project? 

The  most  notable  strength  of  our  proposal  is  the  use  of  unbiased  metabolomic  profiling  to 
distinguish  lethal  from  indolent  disease,  a  major  challenge  in  prostate  cancer  research.  Prostate 
cancer  accounts  for  25%  of  all  newly  diagnosed  cancers  and  9%  of  all  cancer  deaths  in  men, 
making  it  the  most  commonly  diagnosed  and  second  most  lethal  cancer  for  men  in  the  United 
States.  Wide  spread  use  of  PSA  screening  has  changed  the  stage  and  grade  distribution  of 
disease  at  diagnosis  but  appears  to  have  only  modest  effects  on  prostate  cancer  mortality.  In  the 
United  States,  80  to  90%  of  prostate  cancer  cases  are  confined  to  the  prostate  and  two-thirds  of 
the  cases  are  localized  or  regional  disease  and  low-  to  moderate-  grade  at  diagnosis.  Current  use 
of  clinical  features  cannot  always  reliably  distinguish  patients  who  will  die  from  prostate  cancer 
from  those  who  do  not.  Thus,  it  is  important  to  identify  novel  markers  specifically  associated 
with  lethal  prostate  cancer  and  lifestyle  factors  that  influence  disease  progression. 

The  short-term  (1-3  years)  impact  of  this  prospective  study  will  be  to  provide  a  deeper 
understanding  of  the  mechanisms  of  lethal  PC  phenotype  so  that  relevant  biological  pathways 
can  be  revealed  and  new  biomarkers  can  be  developed.  Findings  from  this  study  will  help  better 
understand  the  mechanisms  of  action  of  energy  balance  in  tumor  growth  and  metastasis  and 
reveal  novel  biomarkers  and  pathway  effects.  This  line  of  research  is  especially  important  in  the 
context  of  highly  prevalent  obesity  and  hyperinsulinimia  among  nondiabetic  U.S.  adults  in  recent 
decades.  Findings  of  this  study  could  then  provide  biological  rationale  that  risk  of  lethal  PC 
could  be  reduced  before  or  at  early  stage  of  the  disease  by  modifying  metabolic  risk  through 
physical  activity,  healthy  diets,  and  other  innovative  approaches. 

The  long-tenn  (3-8  years)  goal  of  our  biomarker  study  is  to  provide  targeted  patient 
identification  and  stratification  to  link  patients  with  common  biomarkers  to  the  appropriate 
therapy.  Our  research  could  be  extended  to  larger  validation  studies  using  novel  biomarkers  for 
better  stratification  methods  than  the  current  clinical  parameters  (e.g.,  D’Amico  risk)  to  link 
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patients  with  common  biomarkers  to  the  appropriate  personalized  prevention  and  therapeutic 
strategies.  The  new  risk  stratifications  could  then  help  to  identify  candidates  for  randomized 
trials  of  novel  agents  targeting  metabolic  dysregulation  and  pathways.  Ultimately,  these  novel 
biomarkers  could  also  be  candidates  for  response  to  such  interventions. 

The  overarching  challenges  and  focus  areas  of  this  proposal  are  discovery  and  validation  of 
biomarkers  for  the  detection  and  prediction  of  lethal  prostate  cancer.  This  study  addresses  the 
overarching  challenge  and  one  of  the  PY1 1PCRP  focus  areas:  discovery  and  validation  of 
biomarkers  for  the  detection  and  prediction  of  lethal  prostate  cancer  from  indolent  disease,  so 
that  men  with  indolent  disease  could  be  spared  from  over-treatment,  whereas  those  with  high  risk 
potential  for  lethal  phenotype  could  receive  appropriate  personalized  interventions  at  an  early 
stage. 

What  was  the  impact  on  other  disciplines? 

Nothing  yet  to  report. 

What  was  the  impact  on  technology  transfer? 

Nothing  yet  to  report. 

What  was  the  impact  on  society  beyond  science  and  technology? 

Nothing  yet  to  report. 


5.  Changes/Problems 

Changes  in  approach  and  reasons  for  change 

As  mentioned  above,  we  now  only  focus  on  the  PHS  cohort. 

Actual  or  anticipated  problems  or  delays  and  actions  or  plans  to  resolve  them 

Nothing  yet  to  report. 

Changes  that  had  a  significant  impact  on  expenditures 

Nothing  yet  to  report. 

Significant  changes  in  use  or  care  of  human  subjects,  vertebrate  animals,  biohazards, 
and/or  select  agents 

Nothing  yet  to  report. 


6.  Products 

Journal  Publications: 

1)  Machiela  MJ,  Lindstrom  S,  Allen  NE,  et  al.  Association  of  type  2  diabetes  susceptibility 
variants  with  advanced  prostate  cancer  risk  in  the  Breast  and  Prostate  Cancer  Cohort 
Consortium.  Am  J  Epidemiol.  2012  Dec  15;  176(12):  1 121-9.  PMID:  23193118 

2)  Song  Y,  Chavarro  JE,  Cao  Y,  et  al.  Whole  Milk  Intake  Is  Associated  with  Prostate  Cancer- 
Specific  Mortality  among  U.S.  Male  Physicians.  J  Nutr.  2013  Feb;143(2):  PMID:23256145 
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3)  Cao  Y,  Lindstrom  S,  Schumacher  F.  Et ,  et  al.  Insulin-like  growth  factor  (IGF)  pathway 
genetic  polymorphisms,  circulating  IGF1  and  IGFBP3  levels  and  prostate  cancer  survival. 
JNCI.  2014  Jun;  106(6):  PMID:  24824313 

4)  Mayers  J,  Wu  C,  Clish  C,  et  al.  Elevation  of  circulating  branched-chain  amino  acids  is  an 
early  event  in  human  pancreatic  adenocarcinoma  development.  Nature  medicine.  2014  Sep. 
20(10):  PMID:  25261994 

Other  publications,  conference  papers,  and  presentations: 

Yuan  C,  Cao  Y,  Chavarro  J,  Lindstrom  S  . . .,  Ma  J.  Prediagnostic  body-mass  index,  smoking  and 
prostate  cancer  survival  in  a  multi-cohort  consortium  study.  A  poster  was  presented  at  the 
Frontier  of  Cancer  Prevention  Research  2013  Conference  in  Washington  DC,  Nov.  2013. 

7.Participants  &  Other  Collaborating  Organizations 

What  individuals  have  worked  on  the  project? 


Name 

Jing  Ma 

Project  Role: 

PI 

Researcher  Identifier 

0000-0002-9132-0741 

Nearest  person  month  worked 

2.52 

Contribution  to  Project 

As  the  project  PI,  Dr.  Ma  has  leaded  the  weekly 
meetings  for  project  team  members.  She  direct  and  is 
responsible  for  the  overall  study  design  and 
perfonnance,  report  and  manuscript  preparation 

Funding  Support 

R01CA141298  (Stampfer)  -  0.6  Calendar  Month 
R01CA137178  (Chan) -0.24  Calendar  Month 

W81XWH-1 1-1-0529  (Chavarro)-  0.24  Calendar  Month 
U45  CA10006  (Hu)-  7.08  Calendar  Month 

U01CA155340  (Han-Sub)-  0.12  Calendar  Month 

Name 

Jorge  Chavarro 

Project  Role: 

Other  Significant  Contributor 

Researcher  Identifier 

N/A 

Nearest  person  month  worked 

No  Measurable  Effort 

Contribution  to  Project 

Dr.  Chavarro  works  closely  with  Drs.  Ma,  Clish,  and  Qiu 
on  data  analysis,  interpretation,  and  manuscript 
preparation 

Funding  Support 

HHSN27520 1 000020C  (Hu)-  2.40  Calendar  Month 

CA- 10-006  (Hu)-  2.28  Calendar  Month 

W81XWH-1 1-1-0529  (Chavarro)-  4.20  Calendar  Month 
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Name 

HayHaiyan  Zhang 

Project  Role: 

Data  Manager 

Researcher  Identifier 

N/A 

Nearest  person  month  worked 

3 

Contribution  to  Project 

Zhang  is  responsible  for  managing  the  database  for 
Physicians’  Health  Study  biobank,  case-control  selection 
and  preparation  for  the  biospecimen  pulling  list 
preparation. 

Has  there  been  a  change  in  the  active  other  support  of  the  PD/PI(s)  or  senior/key  personnel 
since  the  last  reporting  period? 

Nothing  to  Report. 

What  other  organizations  were  involved  as  partners? 

Clary  Clish,  sub-contract  PI,  conducting  metabolic  analysis,  the  Metabolite  Profiling  Platform, 
Broad  Institute  of  MIT/Harvard.  Dr.  Clish  is  an  expert  in  metabolic  profiling  assay  development 
and  validations,  and  oversee  the  assay  development,  measurement,  and  data  annotation  at  his 
laboratory. 
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8.  Special  Reporting  Requirements 

COLLABORATIVE  AWARDS:  For  collaborative  awards,  independent  reports  are  required 
from  BOTH  the  Initiating  PI  and  the  Collaborating/Partnering  PI.  A  duplicative  report  is 
acceptable;  however,  tasks  shall  be  clearly  marked  with  the  responsible  PI  and  research  site.  A 
report  shall  be  submitted  to  https://ers.amedd.army.mil  for  each  unique  award. 

QUAD  CHARTS:  If  applicable,  the  Quad  Chart  (available  on  https://www.usamraa.anny.mil) 
should  be  updated  and  submitted  with  attachments. 


No. 


9.  Appendices 

1.  Machiela  MJ,  Lindstrom  S,  Allen  NE,  Haiman  CA,  Albanes  D,  Barricarte  A,  Bemdt  SI,  Bueno- 
de-Mesquita  HB,  Chanock  S,  Gaziano  JM,  Gapstur  SM,  Giovannucci  E,  Henderson  BE,  Jacobs 
EJ,  Kolonel  LN,  Krogh  V,  Ma  J,  Stampfer  MJ,  Stevens  VL,  Stram  DO,  Tjonneland  A,  Travis  R, 
Willett  WC,  Hunter  DJ,  Le  Marchand  L,  Kraft  P.  Association  of  type  2  diabetes  susceptibility 
variants  with  advanced  prostate  cancer  risk  in  the  Breast  and  Prostate  Cancer  Cohort  Consortium. 
Am  J  Epidemiol.  2012  Dec  15;176(12):  1121-9.  PMID:  23193118 

2.  Song  Y,  Chavarro  JE,  Cao  Y,  Qiu  W,  Mucci  L,  Sesso  HD,  Stampfer  MJ,  Giovannucci  E,  Poliak 
M,  Liu  S,  Ma  J.  Whole  Milk  Intake  Is  Associated  with  Prostate  Cancer-Specific  Mortality  among 

3.  U.S.  Male  Physicians.J  Nutr.  2013  Feb;143(2):  PMID:23256145 

4.  Cao  Y,  Lindstrom  S,  Schumacher  F,  Stevens  VL,  Albanes  D,  Bemdt  SI,  Boeing  H,  Bueno-de- 
Mesquita  HB,  Canzian  F,  Chamosa  S,  Chanock  SJ,  Diver  WR,  Gapstur  SM,  Gaziano  JM, 
Giovannucci  EL,  Haiman  CA,  Henderson  B,  Johansson  M,  Le  Marchand  L,  Palli  D,  Rosner  B, 
Siddiq  A,  Stampfer  MJ,  Stram  DO,  Tamimi  R,  Travis  RC,  Trichopoulos  D,  Willett  WC,  Yeager 
M,  Kraft  P,  Hsing  AW,  Poliak  M,  Lin  X,  Ma  J.  Insulin-like  Growth  Factor  Pathway  Genetic 
Polymorphisms,  Circulating  IGF1  and  IGFBP3  Levels  and  Prostate  Cancer  Survival.  J  Natl 
Cancer  Inst.  2014  May  13;106(6). 

5.  Mayers  JR,  Wu  C,  Clish  CB,  Kraft  P,  . .  ..Ma  J. .  ..Wolohin  BM.  Elevation  of  circulating  branched- 
chain  amino  acids  is  an  early  event  in  human  pancreatic  adenocarcinoma  development.  Nature 
Medicine  2014  Oct.  20:1193-1198. 

6.  Yuan  Y,  Cao  Y,  Chavarro  J,  Lindstrom  S,  Qiu  W,  Drake  B,  Willett  W,  Hsing  A,  Kibel  A,  Rosner 
B,  Stampfer  M,  Kraft  P,  Ma  J.  Prediagnostic  body-mass  index,  smoking  and  prostate  cancer 
survival  in  a  multi-cohort  consortium  study.  Manuscript  circulated  to  co-authors. 
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Prediagnostic  Body-mass  Index  (BMI),  Smoking  and  Prostate  Cancer  Survival 

Changzheng  Yuan1,  Yin  Cao1,  Jorge  Chavarro1’2 ,  Sara  Lindstrom2’3,  Peter  Kraft 2’3,  Jing  Ma2  3 

(on  behalf  of  the  Breast  and  Prostate  Cancer  Cohort  Consortium) 

1.  Department  of  Nutrition,  Harvard  School  of  Public  Health,  Boston,  MA,  2.  Channing  Division  of  Network  Medicine,  Brigham  and  Women's  Hospital,  Harvard  Medical  School,  Boston,  MA,  3.  Department  of  Epidemiology,  Harvard  School  of  Public  Health,  Boston,  MA 


BACKGROUND 


>  Meta-analysis  linked  elevated  BMI  with  increased  risk  of  PSA  recurrence  or 
prostate  cancer  (PC)-specific  mortality.  However,  short  follow-up  and  lack  of  control 
for  smoking  are  major  limitations  in  many  of  the  clinical  studies. 

>  Few  prospective  studies  have  sufficient  power  to  investigate  the  relationship 
between  obesity  and  lethal  PC  by  time  of  BMI  measurement  before  PC  diagnosis  and 
by  smoking  status. 

>  This  study  aimed  to  investigate  the  associations  of  pre-diagnostic  BMI  with  risk  of 
progression  from  time  of  PC  diagnosis  to  fatal  outcome,  to  study  whether  the 
relationship  differs  by  time  of  BMI  measurement  before  PC  diagnosis  and  to  assess 
effect  modification  by  smoking  status  on  BMI  and  PC  survival. 


METHODS 


>The  study  included  10,106  PC  cases  from  the  NCI  Breast  and  Prostate  Cancer  Cohort 
Consortium  (BPC3). 

>  BMI  and  smoking  status  were  estimated  at  baseline  before  PC  diagnosis.  Deaths  among  PC 
patients  were  categorized  into  deaths  from  PC  and  other  causes. 

>We  conducted  the  analysis  in  3  parts  according  to  exposures:  BMI  (18-22.9  kg/m2,  23-24.9 
kg/m2.  25-27.9  kg/m2,  28-29.9  kg/m2,  30-34.9  kg/m2,  35  kg/m2+),  smoking  (Never,  former, 
current)  and  their  joint  effect,  respectively. 

>Competing-risks  regression  model  was  used  to  take  into  account  of  other  causes  of  death.  Each 
analysis  was  performed  on  prostate  cancer  specific  mortality  and  other  mortality  separately. 


> Higher  prediagonostic  BMI  was  related  to  higher  risk  of  dying 
from  PC  after  diagnosis,  the  positive  trend  was  mainly  observed 
among  men  whose  BMI  measured  >5  years  before  PC  diagnosis. 

>  Smokers  (both  former  and  current)  had  significant  higher  risk  of 
either  PC  specific  mortality  or  other  mortality,  regardless  of  the 
time  of  measuring  smoking  status  before  PC  diagnosis. 

>The  effect  of  BMI  on  PC  survival  is  modified  by  smoking  status. 
The  positive  trend  of  BMI  with  PC  mortality  was  observed 
mainly  among  never  and  fomier  smokers,  but  not  among  current 
smokers. 


RESULTS 


Table  1.  Characteristics  of  subjects  according  to  death  status 


Variables 

Total  prostate 

cancer  case 

(n=10,106) 

Prostate 

cancer  death 
(n=  1,007) 

Other  death 
(n=  1,886) 

Survivors 

(n=7,213) 

BMI 

18-22.9 

Age  at  diagnosis  (mean±SD,  yr) 

68.7±6.7 

69.5±7.2 

71.7±6.3 

'  (3/. Sit). 9 

Overall 

23-24.9 

BMI(mean±SD,  kg/m2) 

26.2±3.3 

26.3±3.5 

26.1±3.5 

26.2±3.3 

25-27.9 

i y ui  aciuii  utiwcui  i jivii  aixvi  ica 

diagnosis  (meaniSD,  yr) 

7.3±4.7 

6.9±4.3 

7.4±4.6 

7.3±4.7 

28-29.9 

30-34.9 

Follow  up  time  (meaniSD,  yr) 

8.2±3.9 

5.1±3.6 

6.5±3.9 

9.1±3.6 

35+ 

rnHrvrt  (°/A 

ATBC 

10.2 

28.6 

24.2 

4.0 

Measured 

18-22.9 

CPS2 

22.2 

12.3 

18.2 

Z4.0 

<=5  years 

23-24.9 

EPIC 

15.3 

20.8 

9.3 

16.1 

- 

95-97  9 

HPFS 

10.9 

6.7 

11.5 

I  1  .3 

diagnosis 

28-29.9 

MCCS 

9.5 

5.7 

7.0 

10.6 

30-34.9 

MEC 

6.5 

2.8 

6.6 

7.0 

35+ 

PHS 

13.5 

17.2 

14.7 

12.6 

PLCO 

12.0 

6.1 

8.5 

13.8 

18-22  9 

BMI  category  (%) 

Measured 

18<BMI<25 

38.6 

37.8 

41.4 

37.9 

>5  vears 

25<BMI<30 

49.0 

49.3 

45.8 

49.8 

before 
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Figure  1.  Effect  of  BMI  (smoking)  on  PC  mortality  and  other  mortality  (overall  and  by  measurement  time  before  PC  diagnosis) 
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1.  Model  1  adjusted  for  age  at  diagnosis  (continuous),  drinking  status  (never,  <15g/day,  >  15&<30g/day,  >30g/day)  , 
diabetes  status  (never,  baseline,  new),  cohort  (ATBC,  CPS2,  EPIC,  HPFS,  MCCS,  MEC,  PHS,  PLCO),  duration  between 
baseline  and  PC  diagnosis  (continuous); 
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Elevation  of  circulating  branched-chain  amino  acids  is  an  early 
event  in  human  pancreatic  adenocarcinoma  development 

Jared  R  Mayers1,2,23,  Chen  Wu3-5,23,  Clary  B  Clish6,23,  Peter  Kraft5,7,  Margaret  E  Torrence1,2, 

BrianPFiske1, 2,  Chen  Yuan4,  YingBao8, Mary  KTownsend8,  Shelley  STworoger5,8,ShawnMDavidson1,2, 
ThalesPapagiannakopoulos1,2,  Annan  Yang9, TalyaLDayton1,2,ShujiOgino4,5,10,MeirJStampfer5,8,11, 

Edward  L  Giovannucci5,8,1  \  Zhi  Rong  Qian4,  Douglas  A  Rubinson4,  Jing  Ma5,8,  Howard  D  Sesso5,12, 

John  M  Gaziano12,13,  Barbara  B  Cochrane14,  Simin  Liu15,16,  Jean  Wactawski-Wende17,  JoAnn  E  Manson5,8,12, 
MichaelN  Poliak18,19,  Alec  C  Kimmelman9,  Amanda  Souza6,  Kerry  Pierce6,  Thomas  J  Wang20, 
RobertEGerszten6,21,  Charles  SFuchs4,8,MatthewGVanderHeiden1,2,4,6&BrianMWolpin4, 22 


Most  patients  with  pancreatic  ductal  adenocarcinoma  (PDAC) 
are  diagnosed  with  advanced  disease  and  survive  less  than 
12  months1.  PDAC  has  been  linked  with  obesity  and  glucose 
intolerance2-4,  but  whether  changes  in  circulating  metabolites 
are  associated  with  early  cancer  progression  is  unknown. 

To  better  understand  metabolic  derangements  associated  with 
early  disease,  we  profiled  metabolites  in  prediagnostic  plasma 
from  individuals  with  pancreatic  cancer  (cases)  and  matched 
controls  from  four  prospective  cohort  studies.  We  find  that 
elevated  plasma  levels  of  branched-chain  amino  acids  (BCAAs) 
are  associated  with  a  greater  than  twofold  increased  risk  of 
future  pancreatic  cancer  diagnosis.  This  elevated  risk  was 
independent  of  known  predisposing  factors,  with  the  strongest 
association  observed  among  subjects  with  samples  collected 
2  to  5  years  before  diagnosis,  when  occult  disease  is  probably 
present.  We  show  that  plasma  BCAAs  are  also  elevated  in  mice 
with  early-stage  pancreatic  cancers  driven  by  mutant  Kras 
expression  but  not  in  mice  with  A'/Y/s-driven  tumors  in  other 
tissues,  and  that  breakdown  of  tissue  protein  accounts  for 
the  increase  in  plasma  BCAAs  that  accompanies  early-stage 
disease.  Together,  these  findings  suggest  that  increased 
whole-body  protein  breakdown  is  an  early  event  in  development 
of  PDAC. 


PDAC  is  a  leading  cause  of  cancer-related  death  worldwide,  and  most 
patients  have  incurable  disease  at  diagnosis1.  The  best-characterized 
predisposing  factors,  current  tobacco  use  and  a  first-degree  relative 
with  PDAC,  both  impart  an  approximate  1.8-fold  increased  risk  for 
the  disease5’6.  These  risk  factors,  however,  have  thus  far  provided 
limited  insight  into  the  biology  of  early  disease  progression  of  spo¬ 
radic  tumors.  Development  and  progression  of  PDAC  is  also  associ¬ 
ated  with  altered  systemic  metabolism  including  obesity2,  glucose 
intolerance3’4  and  cancer-induced  cachexia7.  Nevertheless,  no  sys¬ 
tematic  examination  of  circulating  metabolites  has  been  performed 
to  determine  whether  altered  metabolism  may  indicate  subclinical 
pancreatic  cancer  or  inform  understanding  of  early  disease  progres¬ 
sion  when  interventions  might  improve  patient  outcomes. 

Prior  efforts  to  identify  changes  in  circulating  metabolites  related 
to  cancer  have  employed  a  cross-sectional  design,  comparing  cancer- 
free  subjects  to  affected  individuals  with  blood  samples  collected  at 
diagnosis8-10.  This  approach  is  problematic  for  discovery  of  changes 
related  to  early  cancer  progression,  as  consequences  of  advanced  dis¬ 
ease  are  likely  to  have  an  impact  on  circulating  metabolite  profiles. 
This  is  particularly  true  for  patients  with  pancreatic  cancer,  who 
commonly  have  significant  anorexia,  weight  loss  and  pancreatic 
insufficiency  at  the  time  of  diagnosis1.  To  investigate  how  altered 
metabolism  might  contribute  to  pancreatic  malignancy,  we  profiled 
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plasma  metabolites  in  cases  with  PDAC  and  matched  controls  drawn 
from  four  prospective  cohort  studies  with  blood  collected  at  least  2 
years  before  cancer  diagnosis  (Supplementary  Table  1).  The  median 
time  between  blood  collection  and  PDAC  diagnosis  was  8.7  years. 


In  conditional  logistic  regression  models,  levels  of  15  metabolites 
were  associated  with  future  diagnosis  of  PDAC  to  P  <  0.05;  three 
metabolites,  the  BCAAs  isoleucine,  leucine  and  valine  were  significant 
to  P  "  0.0006,  the  predefined  significance  threshold  after  correction 
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Figure  1  Plasma  metabolites  and  risk  of  future  pancreatic  cancer  diagnosis.  P  values  of  the  log-transformed,  continuous  metabolite  levels  from  human 
plasma  comparing  pancreatic  cancer  cases  and  controls  in  conditional  logistical  regression  models  conditioned  on  matching  factors  and  adjusted  for  age 
at  blood  draw  (years,  continuous),  fasting  time  (<4  h,  4—8  h,  8— 12  h,  >12  h,  missing)  and  race  or  ethnicity  (white,  black,  other,  missing).  The  dashed 
green  line  indicates  the  statistically  significant  P  value  threshold  after  Bonferroni  correction  for  multiple-hypothesis  testing,  P  "  0.0006  (0.05/83). 

The  dashed  blue  line  indicates  P  of  0.05.  The  number  of  cases  and  controls  analyzed  for  each  metabolite  is  provided  in  Supplementary  Table  2. 
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Table  1  Odds  ratios  for  pancreatic  cancer  by  prediagnostic  plasma  levels  of  BCAAs 


Isoleucine 

Leucine 

Valine 

Total  BCAAsc 

Model3 

Extreme 

quintilesb 

Per  s.d. 

Extreme 

quintilesb 

Per  s.d. 

Extreme 

quintiles'3 

Per  s.d. 

Extreme 

quintiles'3 

Per  s.d. 

Base  model 

2.11 

1.30 

2.08 

1.31 

2.00 

1.23 

2.13 

1.30 

(1.40-3.18) 

(1.15-1.48) 

(1.38-3.13) 

(1.14-1.50) 

(1.37-2.92) 

(1.09-1.39) 

(1.43-3.15) 

(1.14-1.48) 

+  BMI  and  physical  activity 

2.05 

1.29 

2.01 

1.29 

1.94 

1.21 

2.06 

1.28 

(1.34-3.12) 

(1.14-1.48) 

(1.32-3.06) 

(1.12-1.49) 

(1.31-2.86) 

(1.07-1.38) 

(1.37-3.09) 

(1.12-1.47) 

+  BMI,  physical  activity  and  reported  diabetes  at 

2.00 

1.28 

1.97 

1.28 

1.90 

1.20 

2.01 

1.27 

blood  collection 

(1.31-3.05) 

(1.13-1.46) 

(1.29-2.99) 

(1.11-1.48) 

(1.28-2.81) 

(1.06-1.37) 

(1.34-3.03) 

(1.11-1.46) 

+  BMI,  physical  activity,  reported  diabetes,  HbAlc, 

1.86 

1.24 

1.81 

1.24 

1.67 

1.14 

1.89 

1.22 

plasma  insulin,  proinsulin  and  C-peptide 

(1.13-3.03) 

(1.06-1.45) 

(1.11-2.96) 

(1.04-1.47) 

(1.06-2.63) 

(0.99-1.33) 

(1.17-3.06) 

(1.04-1.44) 

Exclude  subjects  with  reported  diabetes  or  HbAlc 

2.12 

1.33 

2.16 

1.32 

1.91 

1.23 

2.19 

1.31 

>6.5%  at  blood  collection 

(1.37-3.27) 

(1.16-1.52) 

(1.39-3.35) 

(1.14-1.54) 

(1.28-2.85) 

(1.08-1.41) 

(1.44-3.34) 

(1.14-1.51) 

Exclude  subjects  with  reported  diabetes  or  HbAlc 

2.18 

1.32 

2.21 

1.33 

1.94 

1.22 

2.25 

1.31 

>6.5%  at  blood  collection  and  those  with 
reported  diabetes  after  blood  collection 

(1.39-3.43) 

(1.15-1.53) 

(1.40-3.49) 

(1.13-1.55) 

(1.27-2.96) 

(1.07-1.40) 

(1.45-3.49) 

(1.13-1.52) 

aOdds  ratio  (95%  Cl)  from  conditional  logistic  regression  models  conditioned  on  matching  factors  and  adjusted  for  age  at  blood  draw  (years,  continuous),  fasting  time  (<4  h, 

4-8  h,  8—12  h,  >12  h,  missing)  and  race  or  ethnicity  (white,  black,  other,  missing).  Subsequent  models  also  adjusted  for  the  indicated  measure  of  energy  balance,  hyperglycemia 
or  insulin  resistance:  body-mass  index  (kg/m2,  continuous),  physical  activity  (metabolic  equivalent  task-hours  per  week  (MET-h/week),  continuous),  reported  diabetes  at  blood 
collection  (yes  or  no),  hemoglobin  A 1  c  (HbA  1  c)  (%,  continuous),  plasma  insulin  (ocIU/ml,  continuous),  plasma  proinsulin  (pM,  continuous)  and  plasma  C-peptide  (ng/ml, 
continuous).  bOdds  ratios  (95%  Cl)  for  the  comparison  of  the  fifth  quintile  to  the  first  quintile  (referent)  for  the  circulating  BCAAs.  cSum  of  the  concentrations  of  the  three 
circulating  BCAAs,  isoleucine,  leucine  and  valine. 


for  multiple-hypothesis  testing  (Fig.  1  and  Supplementary  Table  2). 
To  evaluate  the  magnitude  of  risk  for  PDAC  diagnosis,  we  divided 
participants  into  quintiles  of  increasing  BCAA  levels.  Compared  to 


the  bottom  quintile,  subjects  in  the  top  quintile  had  at  least  a  two-fold 
increased  risk  of  developing  PDAC  (Table  1,  Supplementary  Table  3 
and  Supplementary  Fig.  1).  As  noted  previously11,  circulating  levels 
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Figure  2  Plasma  BCAA  levels  are  elevated  during  subclinical  disease,  (a)  Graph  of  odds  ratio  (error  bars  indicate  95%  confidence  interval  (Cl))  for  future 
pancreatic  cancer  diagnosis  among  cohort  cases  and  matched  controls  comparing  highest  versus  lowest  quintiles  of  circulating  BCAA  levels  stratified 
by  time  from  blood  collection  to  the  case’s  cancer  diagnosis.  Odds  ratio  was  determined  from  conditional  logistic  regression  models  conditioned  on 
matching  factors  and  adjusted  for  age  at  blood  draw  (years,  continuous),  fasting  time  (<4  h,  4-8  h,  8-12  h,  >12  h,  missing)  and  race  or  ethnicity  (white, 
black,  other,  missing).  Red  horizontal  line  marks  an  odds  ratio  of  1.0.  The  number  of  cases  and  controls  in  each  time  period  and  association  results  for 
the  individual  BCAAs  are  provided  in  Supplementary  Table  6.  (b)  Graph  of  mean  (±s.e.m.)  total  plasma  BCAA  concentration  in  LSL-KrasGl2D/+; 

LSL-Trp5 3Rl72H/+;  Pdxl-cre  (KPC)  mice  over  time  and  in  littermate  controls  lacking  either  LSL-KrasGl2D  or  Pdxl-cre  or  both.  Each  control  data  point  is 
an  average  for  one  mouse  over  the  course  of  the  study  (Supplementary  Fig.  4b),  and  values  for  KPC  mice  living  longer  than  19  weeks  are  averaged  for  the 
>19-weeks  time  point.  For  weeks  15—17,  n  =  6  KPC  and  n  =  9  control,  Student’s  /-test,  P  =  0.001.  For  >19  weeks,  #i  =  4;  1 1—13  weeks,  n  =  6; 

7—9  weeks,  n  =  7;  3—5  weeks,  n  =  9.  (c)  H&E  staining  of  pancreatic  tissue  obtained  from  KP  /_C  mice  and  littermate  controls  at  3^1  weeks  of  age.  Tissues 
are  from  a  control  mouse  with  histologically  normal  pancreas  (left)  a  KP_/_C  mouse  with  areas  of  PDAC  adjacent  to  areas  of  normal  pancreas  (middle)  and 
a  KP_/_C  mouse  with  areas  of  PDAC  and  pancreatic  intraepithelial  neoplasia  (arrowheads;  right).  Scale  bars,  50  ocm.  (d)  Mean  (±s.e.m.)  body  weights  at 
3^4  weeks  of  age  for  KP_/_C  mice  and  littermate  controls  ( n  =  l  KP_/_C,  n=  11  control),  (e)  Mean  (±s.e.m.)  total  plasma  BCAA  levels  from  KP_/_C  mice 
and  littermate  controls  at  3-4  weeks  of  age  (n  =  10  KP_/_C,  n  —  14  control,  Student’s  /-test,  P  =  0.002).  (f)  P  values  for  comparison  of  circulating  amino 
acid  levels  in  KP  /_C  mice  and  littermate  controls  at  3^4  weeks  of  age  ( n  =  10  KP_/_C,  n  =  14  control).  Red  points  indicate  BCAAs.  The  dashed  red  line 
indicates  P  value  of  0.05.  (g)  Top,  glucose  tolerance  test  in  KP_/_C  mice  and  littermate  controls  at  the  time  of  weaning  ( n  —  1  KP_/_C,  n—  11  control). 
Bottom,  insulin  tolerance  test  in  KP-/-C  mice  and  littermate  controls  at  4  weeks  of  age  {n  —  1  KP_/_C,  n  =  15  control).  Error  bars  indicate  s.e.m. 

(h)  Mean  (±s.e.m.)  fasting  plasma  insulin  levels  from  KP-/-C  mice  and  littermate  controls  at  4  weeks  of  age  {n  —  1  KP_/_C,  n=  11  control). 
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of  the  three  BCAAs  were  highly  correlated  (Supplementary 
Table  4),  reflecting  their  common  pathways  of  metabolism12  and 
leading  to  similar  results  for  the  sum  total  of  BCAAs  (Table  1  and 
Supplementary  Table  3). 

Circulating  BCAAs  are  elevated  in  obese  individuals  and  those 
with  insulin  resistance13.  In  study  participants,  plasma  BCAA  levels 
modestly  correlated  with  markers  of  energy  balance,  obesity  and  glu¬ 
cose  intolerance  (Supplementary  Table  4).  To  evaluate  the  independ¬ 
ent  effect  of  BCAAs  on  PDAC  risk,  we  assessed  models  that  included 
these  markers  and  found  that  the  odds  ratios  for  PDAC  remained 
largely  unchanged  (Table  1).  Elevated  circulating  levels  of  BCAAs  are 
also  associated  with  future  risk  of  diabetes11-14.  As  type  2  diabetes  is 
a  predisposing  factor  for  PDAC15,  we  questioned  whether  the  inter¬ 
mediate  development  of  diabetes  underlied  the  association  of  BCAAs 
with  future  PDAC  diagnosis.  Exclusion  of  subjects  with  diabetes  at 
blood  collection  did  not  change  our  results  (Table  1),  indicating  that 
we  had  not  identified  a  signature  of  prevalent  diabetes  associated 
with  later  PDAC  diagnosis.  To  determine  whether  increased  circulat¬ 
ing  BCAAs  identify  a  population  at  risk  for  diabetes,  who  are  then 
at  elevated  risk  of  PDAC,  we  excluded  subjects  who  developed  dia¬ 
betes  between  the  time  of  blood  collection  and  cancer  diagnosis  and 
found  the  results  unchanged  (Table  1).  These  data  suggest  that  the 
association  of  circulating  BCAAs  with  future  PDAC  diagnosis  is  not 
dependent  on  intermediate  development  of  diabetes. 

To  examine  the  contribution  of  circulating  BCAAs  to  risk  strati¬ 
fication  models  for  PDAC,  we  evaluated  the  area  under  the  curve 
(AUC)  of  receiver-operating-characteristic  (ROC)  curves16  and  net 
reclassification  improvement  (NRI)17  with  low-risk  and  high-risk  cat¬ 
egories.  Compared  to  the  base  model,  including  circulating  BCAAs 
led  to  a  significant  increase  in  AUC  (Supplementary  Table  5a  and 
Supplementary  Fig.  2)  and  a  net  8.2%  of  cases  moving  to  the  high- 
risk  category  with  an  NRI  of  5%  (Supplementary  Table  5b).  Thus,  in 
our  population,  inclusion  of  circulating  BCAAs  in  risk  stratification 
models  improved  the  ability  to  identify  future  PDAC  cases. 

In  stratified  analyses,  we  noted  no  significant  differences  in  the 
association  of  BCAAs  with  PDAC  by  cohort,  sex,  smoking  status, 


body-mass  index  (BMI)  or  fasting  status  at  blood  collection 
(Supplementary  Fig.  3,  all  interaction  P  >  0.14).  To  examine  when 
circulating  BCAAs  were  most  associated  with  PDAC,  we  stratified 
cases  and  matched  controls  by  time  interval  between  blood  collection 
and  PDAC  diagnosis  (<2  years,  2  to  <5  years,  5  to  <10  years  and  >10 
years).  These  analyses  demonstrated  particularly  strong  associations 
between  elevated  BCAAs  at  2-5  years  before  diagnosis  and  future 
PDAC  diagnosis  (Fig.  2a  and  Supplementary  Table  6). 

Experimental  studies  indicate  years  elapse  between  formation  of 
the  initial  malignant  clone  and  cancer  diagnosis18,  suggesting  that 
occult  PDAC  might  have  been  present  at  the  time  points  showing 
the  strongest  associations  with  elevated  BCAAs.  We  therefore  hypo¬ 
thesized  that  elevated  circulating  BCAAs  are  a  marker  of  early  PDAC. 
To  test  this  possibility,  we  conducted  a  prospective  serial  blood  sam¬ 
pling  study  using  lox-stop-lox  ( LSL)-KrasG12D,+ ;  LSL-Trp53R172B,+ ; 
Pdxl-cre  (KPC)  mice,  which  develop  PDAC  with  variable  latency19. 
KPC  mice  progress  through  all  histological  stages  of  disease,  from 
normal  pancreata  to  invasive  adenocarcinoma,  with  a  median  sur¬ 
vival  of  approximately  21  weeks19  (Supplementary  Fig.  4a).  KPC 
mice  initially  displayed  similar  BCAA  levels  to  littermate  controls,  but 
they  developed  significant  elevations  from  15-17  weeks  before  death 
(Fig.  2b  and  Supplementary  Fig.  4b).  These  data  suggest  circulating 
BCAA  elevations  accompany  early  PDAC. 

LSL-KrasGUD/+ ;  Trp53nox/ilox ;  Pdxl-cre  (KP_/_C)  mice  develop 
PDAC  with  more  consistent  kinetics,  displaying  precursor  lesions  with 
limited  invasive  cancer  by  3-4  weeks  of  age  (Fig.  2c)  and  a  median 
lifespan  of  10-12  weeks20.  In  mice  at  3-4  weeks  of  age,  we  observed 
no  difference  in  body  weight  or  food  consumption  between  KP-/-C 
mice  and  littermate  controls  lacking  either  Pdxl-Cre  or  LSL-KrasGlm 
or  both,  suggesting  animals  with  early  PDAC  had  not  yet  developed 
overt  constitutional  symptoms  (Fig.  2d  and  Supplementary  Fig.  4c). 
Consistent  with  findings  in  patients  and  KPC  mice,  circulating  BCAA 
levels  were  higher  in  KP  /_C  animals  with  subclinical  PDAC  when 
compared  with  those  in  littermate  control  mice  (Fig.  2e),  a  pattern 
not  observed  for  most  other  amino  acids  (Fig.  2f  and  Supplementary 
Fig.  4d,e)  We  observed  no  significant  differences  in  fasting  blood 


Figure  3  BCAA  elevations  are  derived  from 
a  long-term  pool  of  amino  acids,  (a)  Plasma 
levels  (mean  ±  s.e.m.)  of  13C-labeled  leucine 
(M+6)  and  13C-labeled  valine  (M+5)  normalized 

to  food  intake  over  time  following  a  two-hour 
exposure  to  diets  containing  13C-labeled  leucine 

and  13C-labeled  valine  (n  =  8  KP  C,  n  =  6 
control).  The  time  points  correspond  to  the 
red  arrowheads  in  the  diagram,  (b)  Diagram  of 
experiment  using  labeled  diets  to  investigate 
contributions  to  plasma  BCAA  levels  from  long¬ 
term  pools.  Two  cohorts  of  mice  were  used  for 
these  experiments,  one  killed  in  the  fed  state 
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glucose,  response  to  glucose  load,  response  to  insulin  challenge  or 
fasting  insulin  levels  during  intraperitoneal  glucose  and  insulin  tol¬ 
erance  tests  in  4-week-old  KP  /_C  and  control  mice  (Fig.  2g,h  and 
Supplementary  4f-i).  These  findings  argue  that  BCAA  elevations 
are  not  reflective  of  hyperglycemia  or  insulin  resistance  and  instead 
represent  an  early  consequence  of  subclinical  PDAC. 

We  examined  whether  malignancies  in  other  tissues  induced  by 
the  same  genetic  lesions  could  cause  elevated  plasma  BCAA  lev¬ 
els.  Cre  recombinase  introduction  into  lung  or  muscle  of  mice  with 
the  LSL-KrasGl2D  and  7>/j53nox/n°x  alleles  from  the  KP-/-C  model 
leads  to  non-small-cell  lung  cancer  and  sarcoma,  respectively2 1-23 . 
Neither  model  displayed  the  BCAA  alterations  seen  with  early  PDAC 
(Supplementary  Fig.  5).  Subcutaneous  and  orthotopic  implantation 
of  cancer  cell  lines  derived  from  the  KP-/-C  model  into  immunocom¬ 
petent  syngenic  hosts  both  also  failed  to  cause  elevated  BCAA  levels 
(Supplementary  Fig.  6).  These  data  argue  that  elevations  in  BCAAs 
are  associated  with  early-stage  autochthonous  tumors  arising  in  the 
pancreas  and  are  not  a  general  feature  of  Kras-driven  cancer.  They 
also  suggest  that  implantation  of  cells  from  end-stage  disease  does  not 
model  the  early  disease  state  that  results  in  BCAA  elevations. 

Chronic  pancreatitis  is  a  risk  factor  for  human  PDAC24,  and  pancre¬ 
atic  inflammation  can  promote  PDAC  development  and  progression 
in  mice25’26.  Therefore,  we  examined  whether  BCAA  elevations  might 
be  a  cause  or  consequence  of  pancreatic  inflammation  in  early  disease 
in  mice.  Mild,  chronic  pancreatitis  induced  by  caerulein  in  the  absence 
of  tumorigenesis  failed  to  cause  elevations  in  BCAAs  (Supplementary 
Fig.  7a-h),  and  prolonged  increases  in  plasma  BCAA  levels  caused 
by  dietary  interventions  did  not  cause  pancreatic  inflammation  or 
pancreatitis  (Supplementary  Fig.  7i-o).  Nevertheless,  further  studies 
are  needed  to  understand  the  relationship  between  BCAAs  and  more 
severe  pancreatitis. 

Unlike  levels  of  other  amino  acids,  plasma  BCAA  levels  are  not 
regulated  by  the  liver27’28;  instead,  levels  are  determined  by  dietary 
uptake,  tissue  utilization  and  breakdown  of  muscle  and  other  body 
protein  stores27’29.  Therefore,  plasma  BCAAs  may  originate  from 
short-term  pools  related  to  dietary  uptake  and  disposal  or  long¬ 
term  pools  related  to  breakdown  of  tissue  proteins.  To  detennine  the 
involvement  of  the  short-term  pool,  we  fed  4-week-old  KP  '  'C  mice 
and  littermate  control  mice  lacking  either  Pdxl-Cre  or  LSL-KrasGl2D 
or  both  a  defined  amino  acid  diet,  in  which  20%  of  leucine  and  valine 
were  13C  labeled.  KP-/-C  and  control  mice  consumed  similar  amounts 
of  food  when  exposed  to  labeled  diet  for  2  h  (Supplementary  Fig.  8a), 
and  we  observed  no  difference  in  appearance  and  disappearance  of 
plasma  13C  label  (Fig.  3a  and  Supplementary  Fig.  8b),  arguing  that 
gut  uptake  and  peripheral  disposal  of  BCAAs  are  similar  in  mice  with 
or  without  PDAC. 

To  determine  the  contribution  of  long-tenn  BCAA  pools  to  plasma 
levels,  we  exposed  mice  to  labeled  diet  during  a  period  of  rapid  growth 
in  early  life  and  then  switched  them  to  unlabeled  diet  for  3  d  to  chase 
label  from  the  short-term  pool  (Fig.  3b).  Despite  similar  peripheral 
tissue  protein  labeling  (Fig.  3c  and  Supplementary  Fig.  8c),  the  frac¬ 
tion  of  labeled  BCAAs  in  plasma  was  elevated  in  4-week-old  KP  7~C 
mice  relative  to  that  in  littermate  controls  (Fig.  3d).  Furthermore, 
by  comparing  the  amount  of  label  in  plasma  under  fed  conditions, 
encompassing  both  labeled  long-term  and  unlabeled  short-term 
pools,  to  that  under  fasted  conditions,  in  which  only  labeled  long-term 
pools  contribute,  we  calculated  that  increased  liberation  of  BCAAs 
from  long-term  body  stores  was  solely  responsible  for  the  elevations 
in  BCAAs  in  KP-/-C  mice  (Fig.  3d,e  and  Supplementary  Fig.  8d). 
These  data  suggest  that  an  early  consequence  of  PDAC  is  enhanced 


breakdown  of  tissue  proteins  leading  to  elevated  plasma  BCAA  lev¬ 
els.  Consistent  with  this  hypothesis,  KP  /-C  mice  with  early  PDAC 
had  smaller  fast-twitch  muscles  with  no  changes  in  slow-twitch  and 
cardiac  muscle  weight  (Fig.  3f  and  Supplementary  Fig.  9).  Notably, 
muscle  atrophy  associated  with  prolonged  fasting  and  late-stage 
malignancy  exhibits  a  similar  pattern30-32. 

Increased  muscle  catabolism  represents  one  aspect  of  cancer- 
associated  cachexia,  a  wasting  syndrome  frequently  affecting  patients 
with  advanced  PDAC  and  contributing  to  worse  outcomes33-36.  Our 
findings,  however,  suggest  that  protein  breakdown  begins  much  earlier 
than  previously  appreciated  and  predates  onset  of  clinical  cachexia. 
Inflammatory  cytokines  produced  by  immune  and/or  tumor  cells  have 
been  implicated  in  cachexia31'37,  and  the  low  disease  burden  at  the 
time  of  BCAA  elevation  suggests  hormonal  factors  may  be  involved 
in  early  PDAC  to  cause  these  elevations  as  well.  Liberation  of  tissue 
amino  acids  could  support  the  elevated  amino  acid  requirements  of 
pancreatic  cancer  cells38’39,  with  BCAAs  and/or  other  amino  acids 
derived  from  tissue  breakdown  contributing  to  disease  progression. 
Because  hepatic  metabolism  maintains  relatively  constant  plasma 
levels  of  all  amino  acids  except  BCAAs27’29’40,  increased  liberation 
of  tissue  amino  acids  would  be  expected  raise  BCAA  concentrations 
in  blood.  The  association  between  elevated  BCAA  levels  and  other 
metabolic  disease  states11,13’14’41  suggests  that  high  plasma  BCAA 
concentrations  could  be  a  general  marker  of  increased  protein  turno¬ 
ver,  and  elevated  BCAA  levels  could  contribute  to  the  peridiagnostic 
hyperglycemia  commonly  found  in  patients  with  PDAC42. 

In  participants  from  four  large  prospective  cohorts,  circulating 
BCAAs  were  associated  with  future  diagnosis  of  PDAC.  We  observed 
similar  BCAA  elevations  in  two  mouse  models  of  PDAC  and  demon¬ 
strated  that  these  elevations  result  from  breakdown  of  peripheral  protein 
stores.  These  findings  provide  new  insight  into  how  early  disease  affects 
whole-body  metabolism  and  suggest  that  muscle  protein  loss  occurs 
much  earlier  in  disease  progression  than  previously  appreciated. 

METhODS 

Methods  and  any  associated  references  are  available  in  the  online 
version  of  the  paper. 

Note:  Any  Supplementary  Information  and  Source  Data  fdes  are  available  in  the 
onlineversion  ofthepaper. 
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Study  population.  Our  study  population  included  pancreatic  cancer  cases 
and  controls  from  four  prospective  cohort  studies:  the  Health  Professionals 
Follow-Up  Study  (HPFS),  the  Nurses’  Health  Study  (NHS),  the  Physicians’ 
Health  Study  I  (PHS)  and  the  Women’s  Health  Initiative-Observational  Study 
(WHI-OS).  HPFS  was  initiated  in  1986  when  51,529  US  men  40-75  years  of 
age  working  in  health  professions  completed  a  mailed  biennial  questionnaire. 
NHS  was  established  in  1976  when  121,700  female  nurses  aged  30-55  years 
completed  a  mailed  biennial  questionnaire.  PHS  is  a  completed  trial  initi¬ 
ated  in  1982  of  aspirin  and  ©-carotene  among  22,071  male  physicians,  aged 
40-84  years.  After  trial  completion,  study  participants  were  followed  as  an 
observational  cohort.  WHI-OS  consists  of  93,676  postmenopausal  women, 
aged  50-79  years,  enrolled  from  1994-1998  at  40  US  clinical  centers. 
Participants  completed  a  baseline  clinic  visit  and  annual  mailed  question¬ 
naires.  The  study  was  approved  by  the  Human  Research  Committee  at  Brigham 
and  Women’s  Hospital  (Boston,  MA),  and  participants  provided  written 
informed  consent. 

We  included  incident  pancreatic  adenocarcinoma  cases  diagnosed  after  blood 
collection  through  2010  with  available  plasma  and  no  prior  history  of  cancer. 
Cases  were  identified  by  self-report  or  follow-up  of  deaths.  Deaths  were  ascer¬ 
tained  from  next-of-kin,  postal  service  or  National  Death  Index;  this  method 
captures  >98%  of  deaths43.  Medical  records  were  reviewed  by  physicians  blinded 
to  exposure  data  to  confirm  pancreatic  cancer  diagnoses.  Similar  to  prior  studies 
in  these  cohorts4,44-^6  and  based  on  a  predefined  analysis  plan,  we  included  only 
cases  diagnosed  >2  years  after  blood  collection,  as  the  weight  loss  and  insulin 
resistance  that  develop  due  to  pancreatic  cancer  manifest  in  the  2  years  before 
diagnosis42,47.  For  each  case,  we  randomly  selected  two  controls,  matching  on 
cohort  (also  matches  on  sex),  year  of  birth  (±5  years),  smoking  status  (never, 
past,  current,  missing),  fasting  status  (<8  h,  >8  h),  and  month/year  of  blood  col¬ 
lection  (±3  months  in  HPFS,  ±3  months  in  NHS,  ±6  months  in  PHS,  and  exact 
matching  in  WHI).  Controls  were  alive  without  cancer  at  the  case’s  diagnosis 
date  and  provided  a  blood  sample.  Covariate  data  were  obtained  from  baseline 
questiomiaires  in  PHS  and  WHI  and  questionnaires  before  blood  collection  in 
HPFS  and  NHS,  as  described  previously4,45.  In  HPFS,  NHS,  and  PHS,  cancer 
stage  among  cases  was  directly  classified  based  on  medical  record  review  as 
local  disease  amenable  to  surgical  resection,  locally  advanced  disease  that  is 
unresectable  but  without  distant  metastases,  or  distant  metastatic  disease.  In 
WHI,  medical  records  were  coded  using  Surveillance  Epidemiology  End  Results 
summary  staging,  which  classifies  tumors  as  localized,  regional,  or  distant.  These 
stages  were  then  classified  in  the  same  manner  as  in  HPFS,  NHS  and  PHS,  as 
local,  locally  advanced,  and  metastatic  disease,  respectively. 

The  initial  data  set  included  454  cases  and  908  controls.  Seven  controls  had 
insufficient  plasma  for  metabolite  profiling.  One  case  and  one  control  were 
excluded  due  to  missing  data  for  >10%  of  metabolites. 


of  plasma,  2  tubes  of  white  blood  cells  and  1  tube  of  red  blood  cells.  Samples  were 
immediately  frozen  in  vapor-phase  liquid  nitrogen  freezers.  The  NHS  Blood 
Lab  stores  all  biologic  samples  associated  with  the  Blood  Study  in-house  in  a 
large  liquid  nitrogen  freezer  farm.  The  cryotubes  are  stored  in  the  vapor  phase 
of  liquid-nitrogen  freezers;  the  highest  freezer  temperature  is  —130  °C  near  the 
top  of  the  freezer,  and  the  lowest  temperature  is  —196  °C  at  the  bottom  near  the 
liquid  nitrogen.  All  freezers  are  alarmed  and  monitored  continuously  either  by 
NHS  laboratory  staff  or  a  central  security  desk  (nights  and  weekends). 

Physicians  ’  Health  Study.  Blood  collection  kits  were  sent  to  all  participants 
with  instmctions  to  have  blood  drawn  into  the  EDTA  tubes  that  were  provided. 
Two  tubes  were  centrifuged  for  plasma,  and  a  third  tube  was  for  whole  blood. 
The  specimens  were  received  in  the  laboratory  on  chill  packs  within  24  h  of  being 
drawn.  Upon  receipt,  the  samples  were  refrigerated  and  re-aliquotted  into  nine 
1 .2-mL  tubes  (three  whole  blood  and  six  plasma),  all  frozen  at  —82  °C. 

Women  s  Health  Initiative :  Blood  samples  were  collected  on  all  WHI-OS 
participants  at  a  baseline  clinic  visit  in  the  fasting  state.  Blood  samples  were 
maintained  at  4  °C  for  up  to  one  hour  until  plasma  or  serum  was  separated  from 
cells.  Centrifuged  aliquots  were  put  into  —70  °C  freezers  within  two  hours  of 
collection.  Samples  were  shipped  frozen  by  overnight  delivery  to  a  central  facility 
and  kept  within  —70  °C  freezers. 

Plasma  samples  were  grouped  based  on  cohort,  so  that  all  cases  and  controls 
from  a  single  cohort  study  underwent  metabolite  profiling  as  a  batch.  Sample 
triplets  (pancreatic  cancer  case,  matched  control  #1,  and  matched  control  #2) 
were  distributed  randomly  within  the  batch,  and  the  order  of  the  case  and  two 
matched  controls  within  each  triplet  was  also  randomly  designated.  Therefore, 
the  case  and  its  two  controls  were  always  run  in  the  same  batch  and  were  always 
directly  adjacent  to  each  other  in  the  analytic  run,  thereby  limiting  variability 
in  platform  performance  across  matched  case-control  triplets. 

For  participants  from  all  four  cohorts,  plasma  samples  were  thawed  once 
to  aliquot  them  from  large-volume  vials  into  the  smaller  volumes  needed  for 
shipment  to  the  Broad  Institute  of  the  Massachusetts  Institute  of  Technology 
and  Harvard  University  (Cambridge,  MA).  The  samples  were  refrozen  at  the 
Broad  Institute  and  then  thawed  a  second  time  to  perform  metabolite  profiling. 
Therefore,  for  all  cases  and  controls,  plasma  samples  had  been  thawed  twice  at 
the  time  of  metabolite  profiling. 

We  previously  measured  hemoglobin  Ale  (HbAlc)  in  389  cases  and  757 
controls  in  the  laboratory  of  N.  Rifai  (Children’s  Hospital,  Boston,  MA)  using 
reagents  from  Roche  Diagnostics  (Indianapolis,  IN).  We  measured  plasma 
insulin  in  386  cases  and  743  controls,  plasma  proinsulin  in  388  cases  and  746 
controls,  and  plasma  C-peptide  in  408  cases  and  785  controls  using  reagents 
from  Diagnostic  Systems  Laboratory  (Webster,  TX)  and  Millipore  Corporation 
(Billerica,  MA).  Randomly  inserted  samples  from  quality  control  (QC)  plasma 
pools  had  mean  intra-assay  coefficients  of  variance  (CVs)  of  2.0%  for  HbAlc, 
5 .4%  for  insulin,  3.1%  for  proinsulin,  and  4.9%  for  C-peptide4. 


Plasma  samples.  Blood  samples  in  EDTA  tubes  were  collected  from  18,225 
men  in  HPFS  from  1993-1995,  14,916  men  in  PHS  from  1982-1984,  and 
93,676  women  in  WHI-OS  from  1994-1998,  and  in  heparin  tubes  for  32,826 
women  in  NHS  from  1989-1990.  Comparisons  of  participant  characteristics  in 
the  blood  collection  cohort  and  the  full  cohort  are  provided  for  each  study  in 
Supplementary  Note,  Table  S 1 .  Blood  samples  in  HPFS  and  NHS  were  collected 
by  participants,  mailed  overnight  on  cold  packs,  and  then  spun  to  collect  and 
store  plasma  (delayed  processing),  whereas  PHS  and  WHI  participants’  whole 
blood  was  separately  immediately  into  plasma  and  stored.  An  overview  of  pro¬ 
cedures  for  collection  and  storage  of  samples  from  each  cohort  is  provided  below 
and  summarized  in  Supplementary  Note,  Table  S2. 

Health  Professionals  Follow-up  Study.  Upon  arrival  at  the  blood  lab,  vials  were 
centrifuged  in  order  to  separate  the  various  component  parts.  Cryo  storage  tubes 
were  labeled  with  the  appropriate  study  member’s  ID  number,  and  the  separated 
blood  components  were  pipetted  into  them.  This  process  produced  5  tubes  of 
plasma,  2  tubes  of  white  blood  cells,  and  1  tube  of  red  blood  cells  for  each  cohort 
member.  The  tubes  were  then  stored  in  liquid  nitrogen  freezers.  A  bulk  tank, 
holding  up  to  3,000  gallons  of  liquid  nitrogen,  automatically  feeds  each  indi¬ 
vidual  freezer  whenever  the  freezer’s  sensors  indicate  that  coolant  is  required. 

Nurses  ’Health  Study.  Blood  samples  were  separated  into  components  (plasma, 
white  blood  cells  and  red  blood  cells)  and  pipetted  into  8  cryotubes  with  5  tubes 


Metabolite  profiling.  Profiles  of  endogenous  polar  metabolites  were  obtained 
using  liquid  chromatography-tandem  mass  spectrometry  (LC-MS)  at  the  Broad 
Institute  of  the  Massachusetts  Institute  of  Technology  and  Harvard  University 
(Cambridge,  MA).  The  LC-MS  methods  were  designed  to  enable  broad  meas¬ 
urement  of  metabolic  markers  and  intermediates,  including  metabolites  from 
central  metabolism  and  amino  acid  metabolism,  using  low  plasma  sample  vol¬ 
umes48.  LC-MS  parameters  for  targeted  analyses,  including  chromatographic 
retention  times  and  mass  spectrometry  multiple  reaction  monitoring  settings 
(declustering  potentials,  collision  energies,  and  lens  voltages),  were  deter¬ 
mined  using  over  300  commercially  available  reference  compounds.  A  subset 
of  133  polar  metabolites  were  measurable  in  plasma  using  a  combination  of 
two  distinct  hydrophilic  interaction  liquid  chromatography  (HILIC)  methods, 
one  operated  under  acid  mobile  phase  conditions  with  positive-ion-mode  MS 
detection  and  the  other  under  basic  elution  conditions  with  negative-ion-mode 
MS  detection. 

The  acidic  HILIC  method  using  positive-ionization-mode  MS  analyses  was 
similar  to  the  method  described  by  Wang  et  al.n.  Briefly,  the  LC-MS  system 
consisted  of  a  4000  QTRAP  triple  quadmpole  mass  spectrometer  (AB  SCIEX) 
coupled  to  an  1 1 00  Series  pump  (Agilent)  and  an  HTS  PAL  autosampler  (Leap 
Technologies).  Plasma  samples  (10  ocL)  were  extracted  using  nine  volumes 
of  74.9:24.9:0.2  (v/v/v)  acetonitrile/methanol/formic  acid  containing  stable 
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isotope-labeled  internal  standards  (valine-d8,  Isotec;  and  phenylalanine-d8, 
Cambridge  Isotope  Laboratories).  The  samples  were  centrifuged  (10  min, 
9,000g,  4  °C),  and  the  supernatants  (10  ccL)  were  injected  onto  an  Atlantis 
HILIC  column  (150  x  2. 1  mm,  3  ccm  particle  size;  Waters  Inc.).  The  column  was 
eluted  isocratically  at  a  flow  rate  of 250  ocL/min  with  5%  mobile  phase  A  ( 1 0  mM 
ammonium  formate  and  0.1%  formic  acid  in  water)  for  1  min  followed  by  a  linear 
gradient  to  40%  mobile  phase  B  (acetonitrile  with  0.1%  formic  acid)  over  10  min. 
The  ion  spray  voltage  was  4.5  kV  and  the  source  temperature  was  450  °C. 

A  second  method  using  basic  HILIC  separation  and  negative  ionization  mode 
MS  detection  was  established  on  an  LC-MS  system  consisting  of  an  ACQUITY 
UPLC  (Waters  Inc.)  coupled  to  a  5500  QTRAP  triple  quadmpole  mass  spectrom¬ 
eter  (AB  SCIEX).  Plasma  samples  (30  ccL)  were  extracted  using  120  ccLof80% 
methanol  (VWR)  containing  the  internal  standards  inosine-15N4,  thymine-d4, 
and  glycocholate-d4  (Cambridge  Isotope  Laboratories).  The  samples  were  cen¬ 
trifuged  (10  min,  9,000g,  4  °C),  and  the  supernatants  were  injected  directly  onto 
a  Luna  NH2  column  (150  x  2.0  mm,  5  ocm  particle  size;  Phenomenex)  that  was 
eluted  at  a  flow  rate  of 400  ocL/min  with  initial  conditions  of  1 0%  mobile  phase  A 
(20  mM  ammonium  acetate  and  20  mM  ammonium  hydroxide  (Sigma-Aldrich) 
in  water  (VWR))  and  90%  mobile  phase  B  ( 1 0  mM  ammonium  hydroxide  in 
75:25  v/v  acetonitrile/methanol  (VWR))  followed  by  a  10-min  linear  gradient 
to  100%  mobile  phase  A.  The  ion  spray  voltage  was  —4.5  kV  and  the  source 
temperature  was  500  °C. 

Raw  data  were  processed  using  MultiQuant  1 .2  software  (AB  SCIEX)  for 
automated  LC-MS  peak  integration.  All  chromatographic  peaks  were  manually 
reviewed  for  quality  of  integration  and  compared  against  a  known  standard  for 
each  metabolite  to  confirm  compound  identities.  Internal  standard  peak  areas 
were  monitored  for  quality  control,  to  assess  system  performance  over  time,  and 
to  identify  any  outlier  samples  requiring  re-analysis.  A  pooled  plasma  reference 
sample  was  also  analyzed  after  sets  of  20  study  samples  as  an  additional  quality 
control  measure  of  analytical  performance  and  to  serve  as  reference  for  scaling 
raw  LC-MS  peak  areas  across  sample  batches.  Metabolites  with  a  signal-to-noise 
ratio  <10  were  considered  unquantifiable.  Metabolite  signals  were  analyzed  in 
relation  to  pancreatic  cancer  risk  as  LC-MS  peak  areas,  which  are  proportional 
to  metabolite  concentration  and  appropriate  for  metabolite  clustering  and 
correlative  analyses. 

Of  the  133  metabolites  measured  (Supplementary  Note,  Fig.  SI),  83  were 
included  in  the  analyses  of  our  nested  pancreatic  cancer  case-control  popula¬ 
tion  (Supplementary  Note,  Table  S3).  In  pilot  work49,  we  determined  that  32 
metabolites  had  poor  reproducibility  in  samples  with  delayed  processing,  so 
these  metabolites  were  excluded  as  they  could  not  be  reliably  measured  in  two 
of  the  participating  cohorts,  hi  the  current  study,  three  heparin  plasma  pools 
(57  total  QC  samples)  and  three  EDTA  plasma  pools  (128  total  QC  samples) 
were  randomly  interspersed  among  participant  samples  as  blinded  QC  samples. 
We  calculated  mean  CVs  for  each  metabolite  across  QC  plasma  pools  and  set 
an  a  priori  threshold  of  "25%  for  satisfactory  reproducibility.  Based  on  this 
criterion,  13  metabolites  with  mean  CV>25%  were  excluded  from  our  analy¬ 
ses.  Five  metabolites  had  undetectable  levels  for  >10%  of  participants  and  were 
also  excluded.  We  evaluated  plasma  from  ten  volunteers  with  plasma  collected 
simultaneously  in  heparin  and  EDTA  tubes.  For  the  branched  chain  amino  acids, 
Spearman  correlation  coefficients  between  Heparin  and  EDTA  samples  were 
0.85  for  isoleucine,  0.88  for  leucine,  and  0.95  for  valine. 

For  metabolites  meeting  the  threshold  for  statistical  significance  after 
multiple-hypothesis  correction  (isoleucine,  leucine  and  valine),  LC-MS  peak 
areas  were  converted  to  absolute  concentrations  using  stable  isotope-labeled 
standards.  Briefly,  external  calibration  curves  of  MS  response  were  determined 
using  solutions  of  isotope-labeled  13C6,  15N-leucine,  13C6,  15N-isoleucine 
(Cambridge  Isotope  Laboratories),  and  ds-valine  (Isotec).  A  1  ocg/ocL  solution 
of  each  standard  was  prepared  in  water.  20  ocL  of  each  stock  solution  were  added 
to  1 80  ocL  of  reference  pooled  plasma,  and  the  resulting  solution  was  then  serially 
diluted  using  pooled  plasma  to  generate  a  calibration  curve.  For  multiple  reac¬ 
tion  monitoring  MS  analyses,  the  bond  cleavage  products  and  collision  energy 
(CE)  and  declustering  potential  (DP)  settings  were  the  same  as  those  used  for 
the  endogenous  metabolites:  natural  leucine  132/86,  CE  =18  and  DP  =  50;  13C6, 
15N-leucine  134/87,  CE  =  18  and  DP  =  50;  natural  isoleucine  132/86,  CE  =18 
and  DP  =  50;  13C6,  15N-isoleucine  139/92,  CE  =  18  and  DP  =  50;  natural 
valine  1 1 8/72,  CE  =18  and  DP  =  25,  and  ds-valine  126/80,  CE  =18  and  DP  =  25. 


Three  separate  plasma  samples  were  prepared  at  each  concentration  and  were 
analyzed  using  the  acidic  HILIC  LC-MS  method  described  above.  The  median 
concentrations  of  endogenous  isoleucine,  leucine  and  valine  in  the  reference 
pooled  plasma  were  calculated,  and  the  concentration  of  each  metabolite  in 
each  study  sample  was  determined  from  the  response  ratio  relative  to  the  nearest 
reference  pooled  plasma  sample  in  the  analysis  queue. 

Statistical  analyses  for  human  studies.  To  compare  baseline  characteristics,  we 
used  conditional  logistic  regression  conditioned  on  the  matching  factors  and 
including  the  covariate  of  interest.  Partial  Spearman  correlation  coefficients  were 
calculated  for  metabolites  and  covariates.  Metabolites  were  log-transformed  to 
improve  normality  and  included  as  continuous  variables  in  conditional  logis¬ 
tic  regression  models  conditioned  on  matching  factors  and  adjusted  for  age 
at  blood  draw  (years,  continuous),  fasting  time  (<  4  h,  4-8  h,  8-12  h,  >12  h, 
missing)  and  race  or  ethnicity  (white,  black,  other,  missing).  Using  a  conserva¬ 
tive  Bonferroni  correction  for  multiple-hypothesis  testing50,  metabolites  with 
P  "  0.0006  (0.05/83)  were  considered  statistically  significant. 

To  provide  estimates  of  effect  magnitude,  significant  metabolites  were  exam¬ 
ined  in  conditional  logistic  regression  models  after  categorization  into  quin¬ 
tiles  based  on  log-transformed  metabolite  levels  in  controls.  Separate  quintiles 
were  generated  for  fasting  (>8  h  since  last  meal)  and  nonfasting  (<8  h  since  last 
meal)  participants,  given  the  possible  effect  of  fasting  time  on  metabolite  levels. 
Quintiles  were  generated  from  the  population  of  selected  controls,  which  may 
not  exactly  reflect  the  characteristics  of  the  full  cohort  population.  Odds  ratios 
(ORs)  and  95%  confidence  intervals  (CIs)  were  also  calculated  per  s.d.  change 
in  log-transformed  metabolite  levels.  To  control  for  possible  confounding,  we 
evaluated  regression  models  adjusted  for  body-mass  index  (BMI),  physical  activ¬ 
ity,  history  of  diabetes  mellitus,  HbAlc,  plasma  insulin,  plasma  proinsulin,  and 
plasma  C-peptide.  We  also  evaluated  regression  models  after  excluding  sub¬ 
jects  with  diabetes  by  self-report  or  HbAlc  >6.5%  at  blood  collection  (prevalent 
diabetes).  We  further  evaluated  models  that  excluded  subjects  who  developed 
diabetes  after  blood  collection  but  >2  years  before  cancer  diagnosis  (incident 
diabetes  not  thought  to  be  recent  onset  from  pancreatic  cancer)42’47. 

Metabolite  values  were  considered  missing  when  an  LC-MS  peak  was  below 
the  limit  of  detection.  In  the  primary  analysis,  any  case  or  control  with  missing 
data  for  a  metabolite  was  excluded  from  the  analysis  of  that  metabolite.  However, 
we  also  conducted  sensitivity  analyses,  in  which  participants  with  missing  values 
were  assigned  the  lower  limit  of  detection  or  half  of  the  lower  limit  of  detection, 
and  our  results  were  unchanged. 

We  assessed  heterogeneity  of  metabolite  associations  with  pancreatic  cancer 
risk  across  cohorts  using  Cochran’s  ^-statistic51.  We  examined  associations  in 
predefined  subgroups  by  sex,  smoking  status,  BMI,  and  fasting  status.  Statistical 
interactions  were  assessed  by  entering  into  models  the  main  effect  terms  and 
cross-product  terms  of  metabolites  and  stratification  variables,  evaluating 
likelihood  ratio  tests.  We  also  examined  associations  by  time  between  blood 
collection  and  the  case’s  cancer  diagnosis.  In  these  time-based  analyses,  one 
stratum  included  40  pancreatic  cancer  cases  with  blood  collected  within  2  years 
of  diagnosis  and  their  matched  controls.  These  cases  and  controls  were  not  part 
of  the  primary  analysis  population,  but  were  included  in  the  stratified  analyses 
by  time  to  more  fully  delineate  the  association  of  metabolites  with  pancreatic 
development  by  time  before  diagnosis.  Associations  were  also  examined  for 
circulating  BCAAs  with  previously  explored  risk  factors  for  pancreatic  cancer  in 
our  cohorts  (Supplementary  Note,  Table  S4)  and  with  cancer  stage  at  diagnosis 
(Supplementary  Note,  Table  S5). 

Since  association  of  a  marker  with  disease  does  not  indicate  the  suitability 
of  the  marker  to  serve  as  a  screening  test  for  the  disease,  we  examined  two 
approaches  to  quantify  the  value  of  metabolites  in  a  multifactor  risk  discrimi¬ 
nation  tool  for  pancreatic  cancer.  Discrimination  quantifies  the  ability  of  one 
or  more  disease  markers  to  separate  cases  (individuals  with  the  disease)  from 
controls  (individuals  without  the  disease).  We  investigated  the  discrimination 
of  risk  models  for  predicting  pancreatic  cancer  diagnosis  in  the  1 0  years  after 
measurement  of  circulating  BCAAs,  i.e.,  the  10-year  risk  of  pancreatic  cancer. 

In  the  first  approach,  we  investigated  receiver-operating-characteristic  (ROC) 
curve  analysis  and  calculated  of  the  area  under  the  ROC  curve  (AUC),  also 
known  as  the  concordance  (C)  statistic16,52.  The  base  model  included  age  at 
blood  collection  (continuous),  cohort  (HPFS,  NHS,  PHS,  WHI;  which  also 
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accounts  for  sex),  race/ethnicity  (white,  black,  other/missing),  smoking  status 
(never,  past,  current,  missing)  and  fasting  time  (<4  h,  4-8  h,  8-12  h,  >12  h, 
missing).  Three  subsequent  models  mirrored  the  base  model  but  additionally 
included  (1)  body-mass  index,  physical  activity,  and  history  of  diabetes,  (2) 
circulating  BCAAs,  or  (3)  body-mass  index,  physical  activity,  history  of  dia¬ 
betes,  and  circulating  BCAAs.  Each  point  on  the  ROC  curve  shows  the  effect 
of  a  rule  for  turning  a  risk  estimate  into  a  prediction  of  the  development  of 
an  event.  The  y  axis  of  the  ROC  curve  is  the  true  positive  rate  or  sensitivity 
(i.e.,  the  proportion  of  individuals  with  pancreatic  cancer  who  were  correctly 
predicted  to  have  the  disease).  The  x  axis  shows  the  false  positive  rate,  which  is 
the  complement  of  specificity  (i.e.,  the  proportion  of  individuals  without  pan¬ 
creatic  cancer  who  were  incorrectly  predicted  to  have  pancreatic  cancer).  The 
area  under  the  ROC  curve,  the  AUC,  measures  how  well  the  model  discriminates 
between  case  subjects  and  control  subjects.  An  ROC  curve  that  corresponds  to  a 
random  classification  of  case  subjects  and  control  subjects  is  a  straight  line  with 
an  AUC  of  50%.  An  ROC  curve  that  corresponds  to  perfect  classification  has  an 
AUC  of  100%.  The  improvement  in  AUC  for  a  model  containing  a  new  marker 
is  defined  as  the  difference  in  AUCs  calculated  using  a  model  with  and  without 
the  new  marker  of  interest53. 

For  context,  the  Breast  Cancer  Risk  Assessment  Tool,  commonly  referred  to 
as  the  Gail  model54,55,  estimates  a  woman’s  risk  for  breast  cancer  using  clinically 
available  information  including  current  age,  age  at  menarche,  age  of  first  live 
birth,  number  of  first-degree  relatives  with  breast  cancer,  number  of  previous 
breast  biopsies,  breast  biopsies  that  show  atypical  hyperplasia,  and  race/ethnicity. 
The  Gail  model  is  used  to  counsel  women  on  appropriate  screening  tests  for 
breast  cancer56,  for  determining  whether  tamoxifen  will  be  useful  as  a  che- 
mopreventative  agent57,  and  for  determining  sample  size  calculations  in  rand¬ 
omized  clinical  trials  of  prevention  strategies58.  Several  studies  have  evaluated 
the  discrimination  of  the  Gail  model  using  ROC  curve  analysis  and  calculated 
the  AUC  to  be  0.58  to  0.63  (refs.  52,59-61).  Follow-up  studies  have  described 
an  AUC  of  0.62  to  0.66  when  breast  density  is  added  as  an  additional  predictor 
in  the  original  Gail  model59,60,62. 

Although  ROC  curves  are  commonly  used,  they  have  a  number  of  limita¬ 
tions  and  may  underestimate  the  ability  of  a  new  marker  to  contribute  to  risk 
prediction  when  added  to  previously  defined  predictors63-66.  Another  approach 
to  evaluating  model  discrimination  is  to  evaluate  the  ability  of  a  new  marker  to 
shift  an  individual’s  risk  up  or  down  between  predefined  risk  categories.  This 
is  known  as  the  prediction  increment  of  a  marker  and  has  been  codified  in  an 
approach  known  as  net  reclassification  improvement  (NRI)17.  The  NRI  (some¬ 
times  referred  to  as  the  net  reclassification  index)  constructs  reclassification 
tables  separately  for  participants  with  and  without  events  and  quantifies  the 
correct  movement  between  categories  of  risk,  namely,  to  higher  risk  categories 
for  participants  with  events  and  to  lower  risk  categories  for  those  without  events. 
Furthermore,  incorrect  movement  in  categories  of  risk  (downwards  for  events 
and  upwards  for  non-events)  reduces  the  net  correct  reclassification  of  individu¬ 
als  within  the  study  population. 

The  NRI  calculation  is  represented  by  the  following  formula: 

NRI  =  [P(up|Z)  =  1)D  P(down|Z)  =  l)]D[i>(up|Z)  =  0)DP(down^  =  0)] 

Upward  movement  (up)  is  defined  as  a  change  into  a  higher  risk  category  based 
on  the  new  model  and  downward  movement  (down)  as  a  change  into  a  lower  risk 
category  based  on  the  new  model,  where  P  indicates  probability  and  D  denotes 
the  event  indicator  (1,  event;  0,  non-event). 

Using  the  NRI,  we  evaluated  the  ability  of  the  prediction  model  including 
circulating  BCAAs  to  appropriately  reclassify  individuals  into  risk  groups  com¬ 
pared  to  the  base  model.  The  base  model  was  calculated  using  conditional  logis¬ 
tic  regression  conditioned  on  matching  factors  and  adjusted  for  race/ethnicity, 
body-mass  index,  physical  activity  and  history  of  diabetes.  The  subsequent 
model  included  the  covariates  in  the  base  model  with  the  addition  of  circu¬ 
lating  metabolites.  As  in  prior  studies67-69,  we  defined  the  high-risk  group  as 
those  individuals  with  risk  for  pancreatic  cancer  at  least  twofold  greater  than  an 
individual  with  average  risk. 

For  context,  the  Emerging  Risk  Factors  Collaboration70  has  examined  the 
integration  of  novel  risk  factors  into  risk  prediction  models  for  cardiovascular 
disease.  In  these  studies,  additional  potential  risk  predictors  were  added  to  a 


model  of  known  risk  predictors  for  cardiovascular  disease,  including  age,  sex, 
smoking  status,  blood  pressure,  history  of  diabetes,  and  cholesterol.  The  net 
reclassification  improvement  was  then  calculated  for  three  10-year  risk  catego¬ 
ries  for  cardiovascular  disease.  C-reactive  protein  (CRP)  is  a  marker  of  systemic 
inflammation,  and  elevated  CRP  has  been  associated  with  an  increased  risk  for 
cardiovascular  events  in  numerous  studies71-73.  Circulating  CRP  is  currently 
used  to  inform  decisions  in  the  clinic  regarding  screening  and  risk  reduction 
strategies74,75  and  to  design  clinical  trials  testing  novel  treatments  to  reduce 
cardiovascular  events76,77.  In  an  analysis  of  nearly  250,000  individuals78,  the 
addition  of  CRP  to  know  cardiovascular  disease  risk  factors  was  associated  with 
a  statistically  significant  improvement  in  the  area  under  the  ROC  curve  and  a 
NRI  of  1.52%  for  10-year  risk  of  cardiovascular  disease.  In  contrast,  additional 
analyses  demonstrated  a  <1%  improvement  in  the  NRI  for  body-mass  index, 
waist  circumference,  waist-to-hip  ratio,  plasma  fibrinogen,  and  circulating 
apolipoproteins78-80,  such  that  the  clinical  utility  of  these  additional  predictors 
remains  unclear75. 

All  analyses  were  performed  with  SAS  9.2  statistical  package.  All  P  values 
were  two-sided. 

Experimental  mice.  All  studies  were  approved  by  the  MIT  Committee  on 
Animal  Care  (LACUC).  All  experimental  groups  were  assigned  based  on  geno¬ 
type.  All  animals  were  numbered  and  experiments  conducted  blinded.  After  data 
collection,  genotypes  were  revealed  and  animals  assigned  to  groups  for  analysis. 
The  experiments  were  not  randomized. 

KPC.  Experimental  KPC  mice  were  male  mice  on  a  mixed  background, 
heterozygous  for  the  conditional  lox-stop-lox  KrasGl2D  allele,  heterozygous  for 
the  conditional  lox-stop-lox  Trp53Rl7m  allele  and  expressing  Cre  recombinase 
under  control  of  the  Pdxl  promoter  ( Tg(Ipfl-cre)lTuv )19.  Littermate  controls 
lacked  either  the  LSL-A>asG12D  allele,  the  Cre  allele  or  both.  Control  mice  were 
killed  at  the  same  time  as  their  tumor-bearing  littermates. 

KP~/~C.  Experimental  KP-/-C  mice  were  male  mice  on  a  mixed  background, 
heterozygous  for  the  conditional  lox-stop-lox  KrasGi2D  allele,  homozygous 
for  loxP  sites  flanking  exons  2-10  of  Trp53  and  expressing  Cre  recombinase 
under  control  of  the  Pdxl  promoter  ( Tg(Ipfl-cre)lTuv )20.  Littermate  control 
mice  lacked  either  the  Cre-recombinase  allele,  LSL-KrasGl2D  allele  or  both 
(controls  were  non-tumor-bearing  mice  of  all  genotypes).  Inbred  C57BL/6J 
male  mice  containing  the  same  alleles  were  also  examined  where  indicated, 
and  cancer  cell  lines  derived  from  these  mice  (established  in  culture  from 
tumors  prior  to  the  described  implantation  studies)  were  used  for  syngenic 
implantation  studies. 

Non-small-cell  lung  cancer.  Six-month-old  male  mice  on  a  pure  129  back¬ 
ground,  heterozygous  for  the  conditional  lox-stop-lox  KrasGl2D  allele  and 
homozygous  for  loxP  sites  flanking  exons  2-10  of  Trp53,  were  administered 
2.5  x  107  PFU  of  Cre-expressing  adenovirus  intratracheally  as  previously 
described21,22.  High-titer  adenovirus  was  obtained  from  the  Gene  Transfer 
Vector  Core  (University  of  Iowa). 

Hindlimb  sarcoma.  Four-week-old  male  mice  on  a  mixed  background, 
heterozygous  for  the  conditional  lox-stop-lox  KrasGl2D  allele  and  homozygous 
for  loxP  sites  flanking  exons  2-10  of  Trp53  were  administered  2.5 x108  PFU 
of  Cre-expressing  adenovirus  intramuscularly  as  previously  described23. 
High-titer  adenovirus  was  obtained  from  the  Gene  Transfer  Vector  Core 
(University  of  Iowa). 

Implantation,  pancreatitis  and  BCAA  diet  studies.  Male  C57BL/6J  mice 
aged  4-6  weeks  at  the  start  of  the  study  were  used  for  these  experiments. 

Diets.  Standard  chow  diet  was  RMH  3000  (Prolab).  For  amino  acid-defined 
diets,  lx  BCAA  (TD.  110839)  and  2x  BCAA  (TD.l  10843)  were  designed 
in  consultation  with  and  subsequently  obtained  from  Harlan  Teklad. 
20%  13C-leucine-  and  20%  13C-valine-labeled  diets  were  based  on  diet 
TD.  1 10839  and  produced  by  Cambridge  Isotopes  and  Harlan  Teklad. 

Plasma  for  metabolomics.  Plasma  was  collected  for  each  experiment  at  the  time 
points  indicated.  Mice  were  anesthetized  under  2%  isoflurane-oxygen  mixture 
and  retro-orbitally  bled  approximately  4.5  h  after  the  onset  of  the  light  cycle. 
Blood  was  immediately  placed  in  EDTA-pretreated  tubes  and  centrifuged  to 
separate  plasma.  Plasma  was  aliquoted  and  frozen  at  —80  °C  for  further  analysis. 
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Fasting  blood  samples  were  harvested  in  the  same  manner  first  thing  in  the 
morning  after  a  16-hour  overnight  fast. 

Food  consumption.  Mice  were  housed  individually  for  48  h,  and  remaining  food 
pellets  weighed  at  0, 24  and  48  h.  A  two-day  average  was  then  calculated  for  each 
mouse.  Body  weight  was  determined  on  the  second  day.  To  assess  consumption 
of  BCAA  defined  diets,  mice  were  housed  individually  and  fed  diets  for  5  d.  Food 
was  weighed  after  2  d  of  feeding  and  again  on  day  5,  and  the  average  consump¬ 
tion  per  24  h  over  the  72-h  period  was  calculated. 

Blood  glucose,  plasma  insulin,  glucose  tolerance  test  and  insulin  tolerance 
test.  Four-week-old  KP-/-C  mice  were  fasted  overnight  and  blood  glucose  meas¬ 
ured  using  a  One  Touch  Ultra  handheld  glucometer.  25  ocL  of  plasma  from  the 
same  mice  was  harvested  in  heparinized  tubes,  aliquoted,  and  frozen  at  —80  °C 
for  further  analysis.  Plasma  insulin  levels  were  determined  using  an  ultrasensi¬ 
tive  mouse  insulin  ELISA  kit  (Crystal  Chem,  #90080).  After  measuring  fasting 
parameters,  a  glucose  tolerance  test  was  performed  in  accordance  with  published 
protocols81.  Briefly,  conscious  mice  received  an  intrap eritoneal  injection  of 
2  g/kg  glucose  at  time  0.  Blood  glucose  was  subsequently  measured  at  15,  30, 
60,  90  and  120  min  post-injection  as  described  above.  For  insulin  tolerance  test, 
4-week-old  KP-/-C  mice  were  fasted  for  6  h  during  daytime  hours.  Following 
initial  blood  glucose  measurement,  conscious  mice  received  an  intraperitoneal 
injection  of  0.75  IU/kg  recombinant  human  insulin  (Novolin,  Novo  Nordisk). 
Blood  glucose  was  subsequently  measured  at  15,  30,  60  and  90  min  post¬ 
injection  as  described  above. 


Long-term  pool  contribution  of  BCAA  to  plasma.  Mice  were  exposed  to  20% 
13C-leucine-  and  13C-valine-labeled  diets  from  7  d  of  age  to  24  d  of  age  followed 
by  3  d  of  unlabeled  diet  (according  to  the  protocol  depicted  in  Fig.  3b).  Two 
cohorts  of  mice  were  used  in  this  study.  One  cohort  of  mice  was  killed  on  day 
27  in  the  fed  state,  and  a  second  cohort  of  mice  was  killed  on  day  28  after  a  16-h 
overnight  fast  (the  points  indicated  by  the  red  arrowheads  in  Fig.  3b).  At  time 
of  killing,  anesthetized  mice  were  terminally  bled  and  tissues  harvested  within 
5  min,  snap  frozen  in  liquid  nitrogen  using  Biosqueezer  (BioSpec  Products), 
and  stored  at  -80  °C  for  subsequent  GC-MS  analysis.  Plasma  was  aliquoted  and 
frozen  at  —80  °C  for  subsequent  GC-MS  analysis. 

Total  contributions  from  short  and  long-term  pools  were  calculated  according 
to  the  following  equations: 

L  Fed  %  Labeled  \ 

Long  Term  Pool  =  if - - - - — - — -  I  •  Relative  Fed  Pool  Size 

Ij  Fasted  %  Labeled  j 

Short  Term  Pool  =  Relative  Fed  Pool  SizeULong  Term  Pool 

Raw  data  are  summarized  in  Supplementary  Note,  Table  S6. 

Tissue  and  body  weights.  For  measurement  of  tissue  weights,  mice  were 
weighed  before  killing,  then  gastrocnemius,  tibialis  anterior,  soleus  and  heart 
were  subsequently  dissected  and  weighed.  Muscle  weights  for  each  individual 
mouse  were  normalized  to  the  body  weight  of  that  mouse. 


KP-/-C  cell  lines  and  implantation  studies.  End-stage  tumors  were  dissected 
from  C57BL/6J  KP-/-C  mice  and  mechanically  chopped  before  trypsin  disag¬ 
gregation,  with  tumor  cells  then  propagated  for  three  to  five  passages  in  DMEM 
with  10%  FBS,  4  mM  glutamine  and  penicillin/streptomycin  to  obtain  enough 
cells  for  implantation.  Cell  lines  were  negative  for  mycoplasma.  For  subcutane¬ 
ous  implantation  studies,  recipient  mice  were  anesthetized  with  inhaled  2% 
isoflurane-oxygen  mixture,  low  passage  cell  lines  (passage  5  for  each  line)  were 
resuspended  at  2.5  x  105  cells  per  100  ocL  in  sterile  PBS,  and  100  ccL  of  either 
PBS  (control)  or  cell  suspension  was  injected  in  the  flank  of  syngeneic  mice.  For 
orthotopic  implantation  studies,  recipient  mice  were  anesthetized  with  inhaled 
2%  isoflurane-oxygen  mixture,  a  vertical  incision  made  in  the  abdomen  at  the 
left  mid-calvicular  line,  the  spleen  mobilized,  and  25  ocL  of  either  PBS  or  PBS 
containing  1.0  *  105  cells  (passage  3  for  each  line)  was  injected  into  the  tail  of 
the  pancreas. 

Caerulein-induced  chronic  pancreatitis.  Mice  were  treated  with  either  USP- 
grade  saline  or  5  ocg  caerulein  (Sigma)  via  intraperitoneal  injection  daily,  5  d 
per  week  for  10  weeks  as  previously  described25.  Blood  was  obtained  and  the 
mice  killed  on  the  final  day.  Tissues  were  fixed  in  10%  formalin  for  subsequent 
histological  analysis. 

Plasma  markers  of  pancreatitis.  Plasma  amylase  and  lipase  measurements  were 
performed  by  IDEXX  BioResearch  Laboratory  (North  Grafton,  MA). 

BCAA  diet  studies.  Mice  were  fed  either  lx  or  2x  BCAA  diets  for  10  weeks. 
Blood  was  obtained  and  mice  killed  on  the  final  day  of  the  experiment.  Tissues 
were  fixed  in  1 0%  fonnalin  for  subsequent  histological  analysis. 

Studies  to  determine  source  of  BCAA  elevations.  Acute  uptake  and  disposal. 
Following  a  16-h  overnight  fast,  mice  were  fed  20%  13C-leucine  and  valine  con¬ 
taining  diet  for  2  h  before  removal  of  food,  and  food  consumption  during  this 
period  quantified  as  described  above.  At  the  time  points  indicated  by  the  red 
arrowheads  in  Figure  3a,  10-25  ocL  of  plasma  was  harvested  from  the  tail  vein 
of  conscious  mice  in  a  heparinized  tube  and  centrifuged  to  separate  plasma. 
Plasma  was  aliquoted  and  frozen  at  —80  °C  for  subsequent  GC-MS  analysis.  Total 
ion  counts  from  GC-MS  analysis  of  leucine  and  valine  were  then  normalized  to 
norvaline  internal  standard  and  multiplied  by  fractional  labeling  to  determine 
the  amount  of  label  present.  This  number  for  each  animal  was  then  normalized 
to  that  animal’s  food  intake  to  control  for  interanimal  variation  in  labeled  food 
consumption. 


LC-MS  plasma  amino  acid  measurements.  Plasma  amino  acids  were  measured 
by  LC-MS  at  the  Koch  Institute  of  the  Massachusetts  Institute  of  Technology 
(Cambridge,  MA)  using  similar  methods  used  for  assessment  of  metabolites  in 
human  plasma.  Raw  data  were  analyzed  as  peak  area  tops  using  the  open-access 
MAVEN  software  tool82. 

GC-MS  assessment  of  13C-leucine  and  13C-valine  labeling.  Plasma  polar 
metabolites  were  extracted  in  ice-cold  4:1  methanol/ water  with  norvaline  inter¬ 
nal  standard  (5  ocL  plasma  in  200  ccL  extraction  solution).  Extracts  were  clarified 
by  centrifugation  and  the  supernatant  evaporated  under  nitrogen  and  frozen  at 
—80  °C  for  subsequent  derivitization.  Dried  polar  metabolites  were  dissolved  in 
20  ocL  of  2%  methoxyamine  hydrochloride  in  pyridine  (Thermo)  and  held  at 
37  °C  for  1.5  h.  After  dissolution  and  reaction,  tert-butyldimethylsilyl  deriva- 
tization  was  initiated  by  adding  25  ocL  7V-methyl-/V-(tert-butyldimethylsilyl)tri 
fluoroacetamide  +1%  tert-butyldimethylchlorosilane  (Sigma)  and  incubating 
at  37  °C  for  1  h.  The  acid  hydrolysis  protocol  was  adapted  from  Antoniewicz 
et  a/.83.  Briefly,  acid  hydrolysis  of  tissue  proteins  was  performed  on  snap-frozen 
tissues  by  boiling  1-5  mg  tissue  in  1  mL  18%  hydrochloric  acid  overnight  at 
100  °C.  50  ocL  supernatant  was  evaporated  under  nitrogen  and  frozen  at  -80  °C 
for  subsequent  derivitization.  Dried  hydrolysates  were  re-dissolved  in  pyridine 
(10  ocL/1  mg  tissue)  before  tert-butyldimethylsilyl  derivatization,  which  was 
initiated  by  adding  A-methyl-Ar-(tert-butyldimethylsilyl)trifluoroacetamide  + 
1%  tert-butyldimethylchlorosilane  (12.5  ocL/1  mg  tissue,  Sigma)  and  incubating 
at  37  °C  for  1  h. 

GC-MS  analysis  was  performed  using  an  Agilent  7890  GC  equipped  with  a 
30-m  DB-35MS  capillary  column  connected  to  an  Agilent  5975B  MS  operating 
under  electron  impact  ionization  at  70  eV.  One  microliter  of  sample  was  injected 
in  splitless  mode  at  270  °C,  using  helium  as  the  carrier  gas  at  a  flow  rate  of  1  ml 
min-1.  For  measurement  of  amino  acids,  the  GC  oven  temperature  was  held  at 
100  °C  for  3  min  and  increased  to  300  °C  at  3.5  °C  min-1.  The  MS  source  and 
quadrupole  were  held  at  230  °C  and  150  °C,  respectively,  and  the  detector  was 
run  in  scanning  mode,  recording  ion  abundance  in  the  range  of  100-605  mlz. 
MIDs  were  determined  by  integrating  the  appropriate  ion  fragments83  listed  in 
Supplementary  Note,  Table  S7  and  corrected  for  natural  isotope  abundance 
using  an  algorithm  adapted  from  Ferandez  et  a/.84. 

Statistical  analyses  for  mouse  studies.  Appropriate  statistical  tests  were  per¬ 
formed  where  required.  Two-sided  unpaired  Student’s  ?-tests  were  performed 
for  all  statistical  analyses  unless  otherwise  specified  using  Mircosoft  Excel  for 
Mac:201 1  (Microsoft)  or  GraphPad  Prism  6  (GraphPad  Software).  Two-sided 
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repeated-measures  analysis  of  variance  was  performed  to  compare  mean  plasma 
glucose  levels  in  the  glucose  tolerance  test  and  insulin  tolerance  tests85,  using 
SAS  9.2  statistical  package.  No  statistical  method  was  used  to  predetermine 
sample  size. 
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Background  The  insulin-like  growth  factor  (IGF)  signaling  pathway  has  been  implicated  in  prostate  cancer  (PCa)  initiation,  but 
its  role  in  progression  remains  unknown. 

Methods  Among  5887  PCa  patients  (704  PCa  deaths)  of  European  ancestry  from  seven  cohorts  in  the  National  Cancer 
Institute  Breast  and  Prostate  Cancer  Cohort  Consortium,  we  conducted  Cox  kernel  machine  pathway  analysis  to 
evaluate  whether  530  tagging  single  nucleotide  polymorphisms  (SNPs)  in  26  IGF  pathway-related  genes  were 
collectively  associated  with  PCa  mortality.  We  also  conducted  SNP-specific  analysis  using  stratified  Cox  models 
adjusting  for  multiple  testing.  In  2424  patients  (313  PCa  deaths),  we  evaluated  the  association  of  prediagnostic 
circulating  1GF1  and  IGFBP3  levels  and  PCa  mortality.  All  statistical  tests  were  two-sided. 

Results  The  IGF  signaling  pathway  was  associated  with  PCa  mortality  ( P  =  .03),  and  IGF2-AS  and  SSTR2  were  the  main 
contributors  (both  P  =  .04).  In  SNP-specific  analysis,  36  SNPs  were  associated  with  PCa  mortality  with  /Vend  less 
than  .05,  but  only  three  SNPs  in  the  IGF2-AS  remained  statistically  significant  after  gene-based  corrections.  Two 
were  in  linkage  disequilibrium  (r2  =  1  for  rsl004446  and  rs3741211),  whereas  the  third,  rs4366464,  was  independ¬ 
ent  ( r2=  0.03).  The  hazard  ratios  (HRs)  per  each  additional  risk  allele  were  1.19  (95%  confidence  interval  [Cl]  =  1.06 
to  1.34;  Ptren d  =  .003)  for  rs3741211  and  1.44  (95%  Cl  =  1.20  to  1.73;  Ptte nd  <  .001)  for  rs4366464.  rs4366464  remained 
statistically  significant  after  correction  for  all  SNPs  (Ptrend.corr  =  -04).  Prediagnostic  IGFl  (HRhigheSi  vs  lowest  quaitiie  =  0.71; 
95%  Cl  =  0.48  to  1.04)  and  IGFBP3  (HR  =  0.93;  95%  Cl  =  0.65  to  1.34)  levels  were  not  associated  with  PCa  mortality. 

Conclusions  The  IGF  signaling  pathway,  primarily  IGF2-AS  and  SSTR2  genes,  may  be  important  in  PCa  survival. 

JNCI J  Natl  Cancer  Inst  (2014)  106(5):  dju085 

Abundant  experimental  evidence  indicates  that  the  insulin-like 
growth  factor  (IGF)  signaling  pathway  is  important  for  cell  survival 
and  tumorigenesis  (1,2).  Epidemiological  research,  focused  primar¬ 
ily  on  IGFl  and  IGF  binding  protein  3  (IGFBP3)  and  risk  of  inci¬ 
dent  prostate  cancer,  suggests  that  higher  circulating  IGFl  were 
associated  with  increased  risk  of  prostate  cancer  (3),  with  mixed 
findings  for  IGFBP3  levels  (4).  However,  little  is  known  about  the 
role  of  prediagnostic  circulating  levels  of  IGFl  and/or  IGFBP3  in 
prostate  cancer  survival. 

Data  on  genetic  variations  in  IGF-related  genes  and  prostate 
cancer  survival  are  sparse,  limited  by  relatively  small  number  of 
fatal  outcomes  and  assessment  of  only  a  handful  of  single  nucleo¬ 
tide  polymorphisms  (SNPs)  related  to  risk  of  prostate  cancer,  as 
identified  by  tagging  SNPs  or  from  genome- wide  association  stud¬ 
ies  (5,6).  To  the  best  of  our  knowledge,  a  systematic  evaluation  of 


genetic  variants  of  IGF  pathway-related  genes  and  progression  to 
fatal  prostate  cancer  is  lacking. 

The  National  Cancer  Institute  Breast  and  Prostate  Cancer 
Cohort  Consortium  (BPC3),  pooled  data  from  multiple  large 
cohort  studies,  was  designed  to  examine  associations  of  variations 
in  genes  that  mediate  the  steroid  hormone  and  the  IGF  signal¬ 
ing  pathway  with  breast  and  prostate  cancer  risk  (7).  With  an  aver¬ 
age  8.9  years  of  follow-up  among  5887  prostate  cancer  patients 
of  European  ancestry  in  BPC3,  we  aimed  to  1)  use  a  novel  kernel 
machine  pathway  analysis  and  SNP-specific  analysis  to  evaluate 
whether  common  variations  among  26  genes  involved  in  the  syn¬ 
thesis,  metabolism,  and  regulation  of  IGFs  were  associated  with 
prostate  cancer  mortality;  and  2)  investigate  the  associations  of 
prediagnostic  circulating  IGFl  and  IGFBP3  levels  with  prostate 
cancer  mortality  in  a  subset  of  2424  patients. 
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Methods 

Study  Population 

The  BPC3  consists  of  seven  nested  case-control  studies  of  pros¬ 
tate  cancer  from  prospective  cohort  studies  in  the  United  States 
and  Europe:  Alpha-Tocopherol,  Beta-Carotene  Cancer  Prevention 
Study  (ATBC),  American  Cancer  Society  Cancer  Prevention  Study 
II  (CPS-II),  European  Prospective  Investigation  into  Cancer  and 
Nutrition  (EPIC),  Health  Professionals  Follow-up  Study  (HPFS), 
Multiethnic  Cohort  Study  (MEC),  Physicians’  Health  Study 
(PHS),  and  Prostate,  Lung,  Colorectal,  and  Ovarian  (PLCO) 
Cancer  Screening  Trial  (7).  Prostate  cancer  case  patients  were 
ascertained  through  population-based  registries,  self-report,  or 
death  certificates  and  verified  by  medical  records.  Height,  body 
weight,  and  family  history  of  prostate  cancer  were  obtained  by  self- 
report.  Data  on  disease  stage  (Jewett-Whitmore  classification)  and 
grade  (Gleason  score)  were  collected  from  each  cohort.  Written 
informed  consent  was  obtained  from  all  subjects,  and  each  study 
was  approved  by  the  institutional  review  boards  at  their  respective 
institutions.  Details  of  vital  status  follow-up  and  determination  of 
cause  of  death  are  described  in  the  Supplementary  Methods  (avail¬ 
able  online). 


SNP  Selection  and  Genotyping 

A  total  of  590  SNPs  in  26  genes  involved  in  the  synthesis,  metabo¬ 
lism,  and  regulation  of  insulin-like  growth  factors  were  genotyped 
(Figure  1).  After  restricting  to  self-reported  European  ancestry,  a 
total  of  5887  prostate  cancer  patients  were  included  in  this  analysis. 
Two  approaches  were  taken  to  evaluate  linkage  disequilibrium  (LD) 
patterns  and  select  the  SNPs  for  this  analysis  as  described  elsewhere 
(7,8).  Genotyping  was  performed  in  six  laboratories:  National 
Cancer  Institute  Core  Genotyping  Facility  (Gaithersburg,  MD), 
University  of  Southern  California  (Los  Angeles,  CA),  University  of 
Hawaii  (Honolulu,  HI),  Harvard  School  of  Public  Health  (Boston, 
MA),  Imperial  College  (London,  UK),  and  Cambridge  University 
(Cambridge,  UK).  A  total  of  40  SNPs  from  GNRH1,  GNRHR. 
IGF1,  IGFBP1,  and  IGFBP3  were  genotyped  using  TaqMan 
(Applied  Biosystems,  Foster  City,  CA).  The  remaining  SNPs  were 
genotyped  by  Illumina  Golden  Gate  platform  (San  Diego,  CA). 
Interlaboratory  concordance  was  evaluated  by  genotyping  94  sam¬ 
ples  from  the  SNP  500  cancer  panel  (9)  for  the  TaqMan  SNPs  and 
30  HapMap  CEU  (Utah  residents  with  ancestry  from  northern  and 
western  Europe)  trios  for  the  Illumina  panel,  with  concordance 
rates  greater  than  99%  between  laboratories. 


PI3klAktlmToR  pathway  Ras-MAPK  pathway 
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Figure  1.  IGF  signaling  pathway.  Genes  included  in  this  analysis  were  SST,  SSTRJ-5,  GHRH.  GHRHR,  GHR.  IGF1,  IGF1R,  IGFBP1-6,  IGF2-AS,  IGF2R, 
IGFALS,  INSR,  IRS1,  IRS2  (shown  in  Figure  1)  and  POU1F1,  GNRH1,  and  GNRHR  (not  shown);  the  insulin  receptor  is  encoded  by  a  single  gene,  INSR, 
from  which  alternate  splicing  during  transcription  results  in  either  IRA  or  IRB  isoforms;  the  insulin  gene  (INS)  was  not  genotyped,  and  genes  in  PI3k/ 
Akt/mTOR  and  Ras-MAPK  pathway  were  not  included  in  this  analysis. 
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Genotype  data  from  the  Taqman  and  Illumina  platforms  were  fil¬ 
tered  separately.  Any  sample  in  which  more  than  25%  of  the  SNPs 
attempted  on  a  given  platform  failed  was  removed  from  the  data¬ 
set.  Within  each  study,  any  SNP  that  failed  in  25%  or  more  of  the 
samples,  exhibited  a  statistically  significant  (P  <  10~5)  deviation  from 
Hardy- Weinberg  proportions  among  European-ancestry  controls, 
or  had  a  minor  allele  frequency  less  than  1  %  was  removed  from  the 
dataset.  SNPs  that  were  missing  in  more  than  25%  of  the  prostate 
cancer  patients  or  showed  large  differences  in  allele  frequency  among 
subjects  with  European  ancestry  across  studies  (fixation  index  Fst> 
0.02)  were  also  excluded  from  analysis.  For  each  gene  region,  SNPs 
that  were  polymorphic  in  any  of  the  HapMap  reference  panels  were 
imputed  using  MACH  (10).  Genotypes  were  imputed  by  cohort  using 
the  CEPH  (Utah  residents  with  ancestry  from  northern  and  western 
Europe)  European  (CEU)  reference  panel  for  subjects  of  European 
ancestry  (release  No.  21).  Imputed  data  was  filtered  by  study,  and 
poorly  imputed  SNPs  ( r 2  <0.3)  were  removed  from  analysis. 

Circulating  IGF1  and  IGFBP3  levels 

Prediagnostic  measurements  of  IGF  1  and  IGFBP3  were  available  for 
five  of  the  seven  cohorts  (ATBC,  EPIC,  HPFS,  PHS,  and  PLCO; 
n  =  2445)  (11-15).  Details  of  sample  collection  and  storage  were 
described  previously.  Samples  from  ATBC,  HPFS,  and  PHS  were 
measured  in  the  Poliak  laboratory  (McGill  University,  Montreal,  QC, 
Canada),  and  the  remaining  studies  were  measured  in  the  laboratory  of 
the  Hormones  and  Cancer  Team  at  International  Agency  for  Research 
on  Cancer  (IARC)  with  enzyme-linked  immunosorbent  assays 
(Diagnostic  System  Laboratories,  Webster,  TX).  We  excluded  cohort 
and  assay  batch-specific  statistical  outliers  (n  =  21)  based  on  the  gen¬ 
eralized  extreme  studentized  deviate  many-outlier  detection  approach, 
setting  alpha  to  0.05  for  both  IGF  1  and  IGFBP3  blood  levels  (16). 

Statistical  Analysis 

IGF  Gene  Pathway.  The  kernel  machine  Cox  regression  frame¬ 
work  (17,18),  a  novel  and  comprehensive  approach  for  pathway 
analysis  of  censored  survival  outcomes,  was  used  to  assess  associa¬ 
tions  with  deaths  from  prostate  cancer  and  other  causes  for  SNP 
sets  defined  by  all  26  genes  in  the  IGF  pathway  and  each  gene 
individually  after  adjusting  for  continuous  age  and  study  cohort. 
Because  genotyped  SNPs  may  be  imperfect  surrogates  for  the  true 
casual  SNP,  their  individual  relative  risks  are  likely  to  be  modest, 
and  a  multimarker  global  test  will  more  effectively  capture  the  true 
effect.  The  kernel  machine  accounts  for  LD  in  an  SNP  set,  leading 
to  a  powerful  test  with  reduced  degrees  of  freedom.  More  attrac¬ 
tively,  it  can  also  capture  potential  nonlinear  SNP  effects,  SNP- 
SNP  interactions  (epistasis),  and  the  joint  effects  of  multiple  causal 
variants  without  requiring  a  priori  knowledge  of  directionality.  The 
kernel  machine  tests  whether  an  SNP  set  is  associated  with  event 
time  of  interest  after  adjusting  for  covariables,  and  the  test  statistic 
under  the  null  follows  a  mixture  of  x2  distributions,  which  can  be 
approximated  by  resampling  methods.  Logistic  kernel  machines 
have  been  applied  in  a  variety  of  traits  and  diseases  (19,20). 

SNP-specific  analyses  were  conducted  by  stratified  Cox  pro¬ 
portional  hazards  models  under  a  log-additive  hazards  assumption 
and  stratified  by  study  cohort,  allowing  different  baseline  hazards 
for  each  study.  Follow-up  was  defined  from  the  date  of  prostate 
cancer  diagnosis  to  the  date  of  any  death  or  last  follow-up.  The 


assumption  of  proportionality  was  verified  by  testing  each  SNP 
and  time  since  diagnosis,  and  no  violation  was  identified.  All  analy¬ 
ses  were  adjusted  for  age  at  diagnosis  and  further  adjusted  for  stage 
and  Gleason  score  at  diagnosis.  To  correct  for  multiple  testing  with 
possible  presence  of  LD,  the  number  of  effective  SNPs,  Meff,  was 
calculated  for  each  gene  using  a  spectral  decomposition  approach 
(21).  For  gene-based  P  value  correction,  nominal  P  values  for  each 
SNP  were  multiplied  by  the  Mefrfor  the  gene.  For  the  pathway- 
based  correction,  the  Meff  values  for  all  26  genes  were  summed  to 
correct  the  P  values. 

Cumulative  incidence  of  prostate  cancer  death  by  years  since 
diagnosis  were  plotted  for  statistically  significant  SNPs  after  gene- 
level-based  correction  using  competing-risks  regression  by  the 
method  of  Fine  and  Gray  (22). 

Stratified  analysis  of  statistically  significant  SNPs  and  prostate 
cancer  mortality  association  by  age  at  diagnosis  (<65  or  >65  years) 
and  BMI  (<25, 25-30,  or  >30  kg/m2),  Gleason  score  (2-6, 7,  or  8-10) 
and  stage  (A/B  or  C/D)  were  conducted  under  a  dominant  model 
as  a  result  of  limited  sample  size.  To  assess  effect  modification,  we 
added  a  product  term  of  statistically  significant  SNPs  with  the  vari¬ 
ables  above  and  computed  P  values  from  log  likelihood  ratio  test. 

Circulating  IGF1  and  IGFBP3  Levels.  We  created  batch-specific 
(n  =  10)  quartiles  for  IGF1  and  IGFBP3  and  assessed  their  asso¬ 
ciations  with  prostate  cancer  mortality  simultaneously  by  stratified 
Cox  proportional  hazards  models  adjusting  for  age  at  diagnosis. 
Models  were  also  additionally  adjusted  for  BMI  assessed  at  the 
baseline  of  each  study  to  assess  possible  confounding  or  stage  and 
Gleason  score  at  diagnosis  to  evaluate  possible  mediation.  Tests 
for  trend  were  done  by  treating  the  median  concentration  for  each 
quartile  as  a  continuous  variable.  Stratified  analysis  by  stage  and 
Gleason  score  at  diagnosis  were  also  performed.  To  account  for 
the  possibility  of  reverse  causation  in  which  an  undiagnosed  tumor 
could  affect  biomarker  levels,  sensitivity  analyses  were  conducted 
by  excluding  cases  diagnosed  within  2  years  of  blood  draw. 

Analyses  were  conducted  using  SAS  9.2  (SAS  Institute,  Cary, 
NC),  R  (The  R  Foundation  for  Statistical  Computing;  http:// 
www.r-project.org/foundation/),  and  Stata  12  (StataCorp,  College 
Station,  TX).  All  statistical  tests  were  two-sided.  A  P  value  of  less 
than  .05  was  considered  statistically  significant. 

results 

During  an  average  follow-up  of  8.9  years  among  the  5887  case 
patients,  1,999  patients  died,  704  of  whom  had  prostate  cancer  as 
the  underlying  cause  of  death.  Among  the  2424  men  in  the  sub¬ 
group  of  biomarker  analysis,  313  of  the  810  deaths  were  due  to 
prostate  cancer.  Compared  with  those  who  were  either  alive  at  last 
follow-up  or  had  died  from  other  causes,  patients  who  died  from 
prostate  cancer  had  higher  Gleason  score  and  clinical  stage  at  diag¬ 
nosis  but  similar  BMI  (Table  1;  Supplementary  Table  1,  available 
online). 

IGF  Gene  Pathway  and  Prostate  Cancer  Mortality 

Pathway  Analysis.  A  total  of  530  SNPs  were  included  in  the 
genetic  analysis.  Kernel  machine  pathway  analysis  suggests  that 
this  set  of  SNPs  covering  all  26  genes  in  the  IGF  signaling  pathway 
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Table  1.  Characteristics  of  prostate  cancer  patients  in  the  National  Cancer  Institute  Breast  and  Prostate  Cancer  Cohort  Consortium* 


Characteristic 

PCa  death  (n  =  704) 

Censored  (n  =  5183) 

Total  (n  =  5887) 

Age  at  diagnosis,  y,  mean  (SD) 

69.1(7.1) 

68.3(6.4) 

68.4(6.5) 

Diagnosis  to  prostate  cancer  death/ 

5.3  (3.8) 

9.4  (3.9) 

8.9  (4.1) 

censoring,  y,  mean  (SD) 

Body  mass  index,  kg/m2 

18-24.9 

265  (38) 

2030(39) 

2295(39) 

25-29.9 

342  (49) 

2393(46) 

2735(46) 

>30 

78  (11) 

572  (11) 

650  (11) 

Missing 

19(2) 

188(4) 

207  (4) 

Family  history 

Yes 

39(6) 

576 (11) 

615  (10) 

No 

358  (51) 

3038(59) 

3396(58) 

Missing 

307  (44) 

1569(30) 

1876(32) 

Gleason  score 

2-6 

115  (16) 

2567(50) 

2682(46) 

7 

225  (32) 

1465(28) 

1690(29) 

8-10 

217(31) 

485  (9) 

702  (12) 

Missing 

147(21) 

666  (13) 

813  (14) 

Stage 

A  or  B 

259  (37) 

3801(73) 

4060(69) 

CorD 

343  (49) 

702  (14) 

1045(18) 

Missing 

102  (14) 

680  (13) 

782  (13) 

Biomarker  subcohort 

No.  of  patients 

313 

2111 

2424 

Age  at  blood  draw,  y,  mean  (SD) 

64.0  (7.8) 

63.0  (6.9) 

63.1  (7.1) 

Circulating  IGF  1 ,  ng/mL,  median  (IQR) 

161(124-212) 

182(142-228) 

179(139-227) 

Circulating  IGFBP3,  ng/mL,  median  (IQR) 

3110  (2544-3753) 

3613  (2597-4333) 

3544  (2899-4290) 

*  Data  are  No.  (%)  unless  otherwise  specified. 


was  associated  with  prostate  cancer  mortality  ( P  =  .03)  (Table  2). 
When  testing  the  SNP  set  of  each  gene,  IGF2-AS  (9  SNPs;  P  =  .04) 
and  SSTR2  (14  SNPs;  P  =  .04)  showed  statistically  significant  asso¬ 
ciations  with  prostate  cancer  mortality.  The  overall  pathway  P  val¬ 
ues  were  .05  without  either  IGF2-AS  or  SSTR2  and  .08  without 
both  IGF2-AS  and  SSTR2,  suggesting  both  IGF2-AS  and  SSTR2 
may  contribute  to  the  progression  to  fatal  prostate  cancer.  Neither 
the  overall  pathway  nor  IGF2-AS  or  SSTR2  were  associated  with 
risk  of  dying  from  causes  other  than  prostate  cancer. 

SNP-Specific  Analysis.  A  total  of  36  SNPs  were  associated 
with  prostate  cancer  mortality  with  Plani  <  .05  (Supplementary 
Table  2,  available  online).  After  correcting  for  multiple  testing 
at  gene  level,  three  SNPs,  all  in  IGF2  antisense  gene  ( IGF2-AS , 
1 1  p  1 5 .5),  were  statistically  significantly  associated  with  prostate 
cancer-specific  mortality.  Two  of  these  SNPs,  rs  1004446  (intron) 
and  rs374121 1(3’-UTR),  were  in  LD  with  each  other  (r2  =  1  in 
1000  Genome  CEU  population)  but  independent  with  the  third 
SNP  rs4366464  (intron)  (r2=  0.03).  For  rs3741211,  each  additional 
A  allele  was  associated  with  a  19%  (hazard  ratio  [HR]  =  1.19;  95% 
confidence  interval  [Cl]  =  1.06  to  1.34;  PIrend  =  .003)  increased 
risk  of  prostate  cancer-specific  mortality.  For  rs4366464,  each 
additional  minor  allele  G  was  associated  with  a  44%  (HR  =  1.44; 
95%  Cl  =  1.20  to  1.73)  increased  risk  of  prostate  cancer  mortal¬ 
ity  (Ptrmd  =  c.001)  (Table  3;  Supplementary  Figure  1,  available 
online).  The  association  for  rs4366464  remained  statistically  sig¬ 
nificant  after  further  correcting  for  multiple  testing  of  all  SNPs 
(P trend.corr =  -04;  Meff=  424).  When  mutually  adjusted  for  each  other, 
the  hazard  ratios  remained  similar  for  rs3741211  (HR  =  1.15;  95% 
Cl  =  1.03  to  1.30)  and  rs4366464  (HR  =  1.37;  95%  Cl  =  1.13  to 


1.67),  suggesting  independent  additive  effects  of  the  two  SNPs 
on  prostate  cancer  progression.  Cohort- specific  associations 
(Figure  2)  also  indicated  the  robustness  of  these  associations,  and 
minimal  heterogeneities  were  observed  (rs3741211:  P  <  0.05%, 
Pheterogenity  =  -44;  1'S4366464:  P  <  0.05%,  Rheteroge„i,y  =  -55). 

SNP  rs4366464  or  rs3741211  was  not  statistically  significantly 
associated  with  either  Gleason  score  or  stage  (data  not  shown).  After 
additionally  adjusting  for  these  clinical  parameters,  the  association 
between  rs374121 1  and  prostate  cancer  death  remained  unchanged, 
whereas  the  hazard  ratio  for  rs4366464  was  slightly  attenuated.  Neither 
rs374121 1  nor  rs4366464  was  associated  with  risk  of  dying  from  other 
causes  (Table  3).  These  data  suggest  that  the  association  between  the 
two  SNPs  in  IGF2-AS  and  prostate  cancer  mortality  were  independ¬ 
ent  of  tumor  characteristics  and  specific  to  death  from  prostate  cancer. 

Joint  effect  analysis  suggests  that  for  rs3741211,  the  associa¬ 
tion  with  prostate  cancer  mortality  tended  to  be  stronger  among 
men  with  cancer  diagnosed  at  younger  age  or  patients  with  BMI 
less  than  25 kg/m2  (Supplementary  Figure  2,  available  online).  For 
rs43  66464,  the  association  was  stronger  among  men  diagnosed 
at  younger  age.  For  both  SNPs,  the  associations  were  somewhat 
stronger  among  patients  with  higher  stage  (C  or  D)  or  higher 
Gleason  score  (>7).  However,  only  interaction  between  rs374121 1 
and  stage  was  statistically  significant  (P  =  .02). 

Circulating  IGF1  and  IGFBP3  and  Prostate  Cancer 
Mortality 

IGF  1  levels  were  statistically  significantly  correlated  with  IGFBP3 
(r=  0.52;  P<  .001).  Prediagnositic  circulating  levels  of  IGF  1  (HRhighest 
vs  lowest  quartiie =  0.71;  95%  Cl  =  0.48  to  1.04)  and  IGFBP3  (HR  =  0.93; 
95%  Cl  =  0.65  to  1.34)  were  not  associated  with  prostate  cancer 
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Table  2.  IGF  pathway  analyses  for  prostate  cancer— specific  mortality  and  mortality  of  other  causes  by  kernel  machine* 


Gene  abbreviation 

Gene  name 

Chromosomal 

region 

No.  of  SNPs 

included 

P  for  PCa 

deatht 

Pior 

other  deathf 

Pathway 

Total  pathway 

— 

— 

530 

.03 

.14 

Pathway  w/o  IGF2-AS 

— 

— 

521 

.05 

.14 

Pathway  w/o  SSTR2 

— 

— 

516 

.05 

.13 

Pathway  w/o  IGF2-AS 

— 

— 

507 

.08 

.13 

and  SSTR2 

Gene 

GHR 

Growth  hormone  receptor 

5pl3-pl2 

34 

.16 

.61 

GHRH 

Growth  hormone  releasing  hormone 

20ql  1.2 

9 

.14 

.75 

GHRHR 

Growth  hormone  releasing  hormone  receptor 

7pl4 

26 

.38 

.85 

GNRHl 

Gonadotropin-releasing  hormone  1 

8p21-pll.2 

3 

.86 

.55 

GNRHR 

Gonadotropin-releasing  hormone  receptor 

4q2 1 .2 

6 

.15 

.07 

IGF1 

Insulin-like  growth  factor  1 

12q23.2 

14 

.35 

.12 

IGF1R 

Insulin-like  growth  factor  1  receptor 

15q26.3 

112 

.36 

.20 

IGF2-AS 

IGF2  antisense  RN  A 

1  lp  1 5.5 

9 

.04 

.40 

IGF2R 

Insulin-like  growth  factor  2  receptor 

6q26 

68 

.09 

.16 

IGFALS 

Insulin-like  growth  factor  binding  protein,  acid 
labile  subunit 

16pl3.3 

7 

.34 

.54 

IGFBP1 

Insulin-like  growth  factor  binding  protein  1 

7pl3-pl2 

7 

.29 

.22 

IGFBP2.5 

Insulin-like  growth  factor  binding  protein  2  and  5 

2q33-q36 

36 

.06 

.34 

IGFBP3 

Insulin-like  growth  factor  binding  protein  3 

7pl3-pl2 

8 

.77 

.22 

IGFBP4 

Insulin-like  growth  factor  binding  protein  4 

17ql2-q21.1 

7 

.69 

.67 

IGFBP6 

Insulin-like  growth  factor  binding  protein  6 

12ql3 

7 

.76 

.54 

INSR 

Insulin  receptor 

19pl3.3-pl3.2 

53 

.07 

.19 

IRS1 

Insulin  receptor  substrate  1 

2q36 

8 

.53 

.85 

IRS2 

Insulin  receptor  substrate  2 

13q34 

13 

.38 

.82 

POU1F1 

POU  class  1  homeobox  1 

3pl  1 

6 

.55 

.42 

SST 

Somatostatin 

3q28 

16 

.66 

.03 

SSTRI 

Somatostatin  receptor  1 

14q  1 3 

19 

.31 

.06 

SSTR2 

Somatostatin  receptor  2 

17q24 

14 

.04 

.50 

SSTR3 

Somatostatin  receptor  3 

22ql3.1 

18 

.96 

.96 

SSTR4 

Somatostatin  receptor  4 

20pl  1.2 

26 

.24 

.41 

SSTR5 

Somatostatin  receptor  5 

16pl3.3 

4 

.78 

.91 

*  PCa  —  prostate  cancer;  SNP  =  single  nucleotide  polymorphism. 

f  P  values  were  calculated  using  kernel  machine  Cox  regression  framework  and  were  two-sided. 


mortality  in  the  model  mutually  adjusted  for  each  other  and  age  at 
diagnosis  (Table  4).  The  hazard  ratios  were  similar  after  addition¬ 
ally  adjusting  for  stage  and  Gleason  score  at  diagnosis  in  the  model, 
or  BMI  at  baseline,  or  excluding  IGF  1  and  IGFBP3  measurements 
within  2  years  of  prostate  cancer  diagnosis  (data  not  shown).  In  sub¬ 
group  analysis,  higher  IGF1  levels  were  statistically  significantly 
associated  with  lower  prostate  cancer  mortality  (Ptrsnd  =  .02)  among 
men  diagnosed  with  more  advanced  tumors  (stage  C  or  D). 

Discussion 

To  the  best  of  our  knowledge,  this  analysis  of  IGF  pathway  genes 
in  relation  to  prostate  cancer  mortality  among  prostate  cancer 
patients  is  the  largest  study  to  date.  Using  the  kernel  machine 
pathway  analysis,  a  powerful  test  allowing  assessment  of  the  joint 
associations  of  variants  in  a  predefined  pathway,  we  demonstrated 
that  the  IGF  pathway  was  statistically  significantly  associated  with 
prostate  cancer  mortality  and  two  genes,  IGF2-AS  and  SSTR2, 
may  play  important  roles  in  prostate  cancer  progression.  Using 
SNP-speciftc  association  analysis,  we  further  identified  two  SNPs, 
rs3741211  and  rs4366464  in  IGF2-AS,  that  were  statistically  sig¬ 
nificantly  associated  with  prostate  cancer  mortality. 


Additionally,  among  a  subset  of  2424  patients,  we  found  no 
overall  associations  between  prediagnostic  circulating  levels  of 
IGF1  and  IGFBP3  and  prostate  cancer  mortality.  The  null  associa¬ 
tions  between  IGF1  and  IGFBP3  genes  and  prostate  cancer  mor¬ 
tality  suggest  that  their  roles  in  the  progression  of  prostate  cancer 
were  limited.  In  previous  analyses  of  BPC3  patients,  genetic  vari¬ 
ations  in  IGF1  and  SSTR5  were  associated  with  circulating  levels 
of  IGF1,  and  IGFBP3  and  IGFALS  genes  were  associated  with 
IGFBP3  levels  (8,23).  However,  none  of  the  SNPs  in  these  genes 
were  associated  with  prostate  cancer  mortality  in  our  analysis, 
which  is  in  line  with  the  null  findings  between  circulating  levels  of 
IGF1  and  IGFBP3  and  prostate  cancer  mortality.  Although  these 
findings  should  be  interpreted  with  caution  given  the  heterogenei¬ 
ties  in  blood  collection,  sample  storage,  and  assay  variation  across 
the  cohorts,  the  findings  are  not  surprising  because  recent  prospec¬ 
tive  studies  did  not  support  stronger  associations  of  IGF1  levels 
with  risk  of  advanced  prostate  cancer,  favoring  the  hypothesis  that 
common  germline  variations  or  circulating  levels  of  IGF1  may 
contribute  to  early  growth  of  prostate  carcinogenesis  (4),  but  not 
during  progression. 

The  role  of  IGF2-AS  and  IGF2  in  prostate  cancer  initiation 
and  progression  is  largely  underexplored.  A  previous  genome-wide 
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Table  3.  Single  nucleotpide  polymorphisms  in  IGF2-AS  associated  with  prostate  cancer— specific  mortality  after  gene-based  P  value  correction1 


SNP 

Risk 

allele 

RAF 

Chromosomal 

region 

Position 

Genotype 

Person- 

years 

No. 

rs374121 1 

A 

0.626 

1  lp  1 5.5 

2169110 

GG 

6945 

64 

GA 

23  425 

312 

AA 

19562 

295 

AA/GA 

42  987 

607 

per  allele 

— 

— 

-Ptrend§ 

— 

— 

^trend.corr§ 

— 

— 

rs4366464 

G 

0.066 

1  lp  1 5.5 

2164799 

CC 

44  133 

566 

GC 

6596 

117 

GG 

169 

6 

GG/GC 

6764 

123 

per  allele 

-PtrEnd§ 

'f’trend.corr§ 


PCa  death  Other  death 


HR(95%CI)f 

HR  (95%  Cl)  J 

No. 

HR  (95%  CI)f 

HR  (95%  CI)J 

1 .00  (referent) 

1.00  (referent) 

182 

1 .00  (referent) 

1 .00  (referent) 

1.40(1.07  to  1.83) 

1.43(1.09  to  1.87) 

558 

0.89(0.75  to  1.05) 

0.89(0.75  to  1.05) 

1.55  (1.18  to  2.03) 

1.50  (1.15  to  1.97) 

500 

0.93  (0.79  to  1.11) 

0.93  (0.78  to  1.10) 

1.47(1.13  to  1.90) 

1.46  (1.13  to  1.90) 

1058 

0.91  (0.78  to  1.07) 

0.90(0.77  to  1.06) 

1.19  (1.06  to  1.34) 

1.16  (1.04  to  1.30) 

— 

0.99  (0.91  to  1.07) 

0.98  (0.90  to  1.07) 

.003 

.01 

— 

.75 

.66 

.02 

.08 

— 

1.00 

1.00 

1 .00  (referent) 

1.00  (referent) 

1099 

1 .00  (referent) 

1 .00  (referent) 

1.39  (1.14  to  1.70) 

1.32(1.08  to  1.62) 

157 

0.96(0.82  to  1.14) 

0.96(0.81  to  1.14) 

2.87  (1.28  to  6.44) 

2.34  (1.04  to  5.25) 

7 

1.93  (0.92  to  4.08) 

1.88  (0.89  to  3.97) 

1.43  (1.18  to  1.74) 

1.35  (1.11  to  1.65) 

164 

0.99  (0.84  to  1.16) 

0.98  (0.83  to  1.16) 

1.44  (1.20  to  1.73) 

1.36  (1.13  to  1.63) 

— 

1.01(0.86  to  1.18) 

1.00  (0.86  to  1.18) 

.0001 

.001 

— 

.92 

.96 

.0008 

.01 

— 

1.00 

1.00 

*  Cl  =  confidence  interval;  HR  =  hazard  ratio;  PCa  =  prostate  cancer;  SNP  =  single  nucleotide  polymorphism;  RAF  =  risk  allele  frequency  in  patients  who  did  not  die  from  prostate  cancer, 
f  The  Cox  model  was  stratified  by  study  cohort  and  adjusted  for  age  at  diagnosis. 

f  The  model  was  additionally  adjusted  for  Gleason  score  and  stage  at  diagnosis.  Because  adding  body  mass  index  to  the  multivariable  model  did  not  alter  the  hazard  ratios,  we  decided  not  to  present  results  adjusted 
for  body  mass  index. 

§  P trend  were  calculated  using  stratified  Cox  proportional  hazards  models  under  a  log-additive  hazards  assumption  and  were  two-sided.  -Ptrendcorr  were  Plrc nd  after  gene-based  correction  for  multiple  testing  (Me)r  =  8). 


A 

Study  RAF  Fatal  Censored 


HR  (95%  Cl) 


CPSII 

0.617 

49 

1162 

ATBC 

0.641 

273 

705 

EPIC 

0.628 

114 

545 

HPFS 

0.688 

44 

468 

MEC 

0.593 

22 

398 

PHS 

0.602 

149 

835 

PLCO 

0.629 

20 

845 

Pooled 

0.626 

671 

4758 

0.89  (0.6d  to  1.32) 


1.15(0.95 
1.18(0.901  to  t.55) 


0.95  (0.96  to  1.53) 


1.62  (0:67  to  3.03) 
1.42  (1.11  to  1.01)' 
1.35  (0.00  to  2.07^ 


1.19(1.06  to  >.34) 


.5  1  1.5 

Hazard  ratio 


T 

3 


B 

Study  RAF  Fatal  Censored  FIR  (95%  Cl) 


Figure  2.  Association  of  IGF2-AS  single  nucleotide  polymorphism 
rs3741211  and  rs4366464  with  prostate  cancer— specific  mortality  by  study 
cohort.  Hazard  ratios  (HRs;  diamonds)  and  95%  confidence  intervals  (CIs; 
error  bars)  calculated  for  the  association  for  the  individual  studies  and  the 
pooled  analysis  for  rs3741211  (A)  and  rs4366464  (B)  are  shown.  Size  of 
gray  square  represents  percentage  weight  of  each  study.  RAF  =  risk  allele 


frequency.  ATBC  =  Alpha-Tocopherol,  Beta-Carotene  Cancer  Prevention 
Study;  Cl  =  confidence  interval;  CPS-II  =  American  Cancer  Society  Cancer 
Prevention  Study  II;  EPIC  =  European  Prospective  Investigation  into  Cancer 
and  Nutrition;  HPFS  =  Health  Professionals  Follow-up  Study;  HR  =  hazard 
ratio;  MEC  =  Multiethnic  Cohort  Study;  PHS  =  Physicians’  Health  Study; 
PLCO  =  Prostate,  Lung,  Colorectal,  and  Ovarian  Cancer  ScreeningTrial. 


association  study  identified  SNP  rs7127900  in  IGF2-AS  as  associ¬ 
ated  with  risk  of  incident  prostate  cancer  (24)  but  not  with  prostate 
cancer  mortality  (5).  This  SNP  was  not  in  LD  with  the  two  SNPs 
we  identified  (r2  =  0.01  for  rs374121 1  and  r 2  =  0.003  for  rs4366464 
in  1000  Genome  CEU  population). 


IGF2  is  a  peptide  growth  factor  that  is  homologous  to  both 
IGF1  and  insulin;  interaction  of  IGF2  with  insulin  receptor  sub- 
type  A  (IRA)  may  play  a  role  both  in  fetal  growth  and  cancer 
biology  (25).  IGF2-AS  expresses  a  paternally  imprinted  antisense 
transcript  of  the  IGF 2  gene.  It  is  transcribed  in  the  opposite 
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Table  4.  Circulating  levels  of  IGF  1  and  IGFBP3  and  prostate  cancer— specific  mortality  in  the  National  Cancer  Institute  Breast  and  Prostate 
Cancer  Cohort  Consortium 


Quartile* 

Qi 

Q2 

Q3 

Q4 

Outcome/  Fatal/ 

biomarker  censored 

HR  (95%  Cl) 

Fatal/ 

censored 

HR  (95%  Cl) 

Fatal/ 

censored 

HR  (95%  Cl) 

Fatal/ 

censored 

HR  (95%  Cl) 

^>trcnd§ 

All  cases 
Model  If 

IGF1 

101/501 

1.00  (referent) 

80/529 

0.86(0.63  to  1.17) 

68/541 

0.74(0.53  to  1.05) 

64/540 

0.71  (0.48  to  1.04) 

.08 

IGFBP3 

102/499 

1.00  (referent) 

77/534 

0.83  (0.61  to  1.13) 

59/549 

0.67  (0.47  to  0.95) 

75/529 

0.93  (0.65  to  1.34) 

.35 

Model  2% 

IGF1 

101/501 

1.00  (referent) 

80/529 

0.84  (0.62  to  1.14) 

68/541 

0.77  (0.55  to  1.09) 

64/540 

0.77  (0.52  to  1.14) 

.18 

IGFBP3 

102/499 

1 .00  (referent) 

77/534 

0.77  (0.57  to  1.06) 

59/549 

0.59(0.41  to  0.84) 

75/529 

0.93  (0.65  to  1.35) 

.28 

Stage  A  or  Bf 
IGF1 

35/357 

1 .00  (referent) 

23/375 

0.81  (0.46  to  1.41) 

27/363 

1.05  (0.59  to  1.89) 

19/374 

0.75  (0.37  to  1.53) 

.53 

IGFBP3 

39/347 

1 .00  (referent) 

25/376 

0.65  (0.37  to  1.12) 

13/381 

0.37  (0.19  to  0.74) 

27/365 

0.77  (0.41  to  1.46) 

.23 

Stage  C  or  Df 
IGF1 

48/56 

1 .00  (referent) 

38/80 

0.73  (0.46  to  1.16) 

32/91 

0.58  (0.35  to  0.94) 

30/89 

0.52  (0.30  to  0.90) 

.02 

IGFBP3 

40/64 

1 .00  (referent) 

40/82 

0.99  (0.63  to  1.55) 

34/83 

0.91  (0.55  to  1.52) 

34/87 

1.26  (0.73  to  2.19) 

.38 

Gleason<7f 

IGF1 

19/246 

1 .00  (referent) 

13/267 

0.72  (0.34  to  1.53) 

14/272 

0.78  (0.36  to  1.70) 

12/297 

0.68  (0.28  to  1.68) 

.49 

IGFBP3 

19/247 

1 .00  (referent) 

15/274 

0.86  (0.42  to  1.79) 

1 1/283 

0.68  (0.30  to  1.55) 

13/278 

0.85  (0.36  to  2.02) 

.64 

Gleason  >7f 
IGF1 

56/174 

1 .00  (referent) 

48/180 

0.93  (0.62  to  1.41) 

30/188 

0.64  (0.39  to  1.06) 

35/172 

0.81  (0.47  to  1.40) 

.33 

IGFBP3 

57/169 

1 .00  (referent) 

34/182 

0.72  (0.46  to  1.12) 

40/179 

0.85  (0.54  to  1.34) 

38/184 

0.83  (0.49  to  1.41) 

.63 

*  Batch-specific  (n  =  10)  quartiles  were  used.  All  models  were  stratified  by  study  cohort  and  simultaneously  adjusted  for  IGF1  and  IGFBP3.  Cl  =  confidence  interval; 
HR  =  hazard  ratio. 

t  Adjusted  for  age  at  diagnosis 

J  Adjusted  for  age,  Gleason  score,  and  stage  at  diagnosis 

§  P trend  values  were  calculated  by  treating  the  median  concentration  for  each  quartile  as  a  continuous  variable  and  were  two-sided. 
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Figure  3.  Gene  map  of  IGF2-AS/IGF2/INS  region  and  single  nucleotide  polymorphisms  (SNPs)  genotyped  in  IGF2-AS  (n  =  9).  Only  SNPs  rs  1004446 
and  rs374121 1  have  an  ;-2  greater  than  0.8,  indicated  by  an  asterisk  (*). 


direction  to  the  IGF2  transcripts,  with  some  genomic  regions 
shared  with  IGF2  (Figure  3)  (26).  IGF2-AS  and  IGF2  were  over¬ 
expressed  in  Wilms’  tumor  through  loss  of  imprinting  (26,27). 
Loss  of  imprinting  of  IGF2  is  generally  manifested  by  the  activa¬ 
tion  of  the  normally  silenced  maternal  allele  with  the  subsequent 
expression  of  both  gene  copies.  Evidence  from  Wilms’  tumor, 
colorectal  cancer,  and  ovarian  cancer  suggests  that  the  biallelic 
IGF2  expression  also  correlates  with  aberrant  IGF2/H19  meth- 
ylation  (28,29).  IGF2  levels  were  increased  in  prostate  tumor- 
associated  tissues,  and  a  widespread  IGF2  loss  of  imprinting 
throughout  the  peripheral  prostate  in  men  with  prostate  cancer 
was  observed  but  not  in  samples  of  benign  prostatic  hyperplasia 
or  other  adult  tissues,  suggesting  that  epigenetic  modification 
may  play  an  important  role  in  prostate  cancer  carcinogenesis 


(30) .  Overexpression  of  IGF2  and/or  IRA  has  been  proposed  as  a 
potential  mechanism  of  resistance  to  IGFIR-directed  therapies 

(31) . 

SSTR2  has  been  documented  in  experimental  and  clinical  pros¬ 
tate  cancer  research  but  not  in  population  studies.  Somatostatin 
exerts  inhibitory  effects  on  cancer  cells,  including  prostate,  through 
five  specific  G-protein-coupled  membrane  receptors,  SSTR1-5, 
with  SSTR2  being  predominant  in  human  cancers  (32,33).  Its 
analogs,  octreotide  and  lanreotide,  which  have  high  affinity  for 
SSTR2,  have  been  used  to  treat  hormone-refractory  prostate  can¬ 
cers  (34,35)  but  are  still  under  development. 

The  major  strength  of  this  study  is  the  use  of  a  large  cohort 
consortium  to  study  genetic  predispositions,  which  are  less  likely 
to  be  affected  by  screening  and  treatment.  Another  strength  is  our 
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comprehensive  evaluations  of  genetic  variants  in  the  IGF  pathway 
using  pathway,  SNP-specific,  and  study  cohort-specific  analysis. 
However,  additional  genotyping  to  narrow  down  the  region  harbor¬ 
ing  the  causal  allele,  followed  by  functional  work  on  the  identified 
variants  and  validations  in  other  independent  studies  and/or  races/ 
ethnicities  are  necessaiy.  Lack  of  patient  treatment  information  was 
another  limitation.  However,  associations  of  IGF  genetic  polymor¬ 
phisms  or  biomarkers  with  prostate  cancer  mortality  were  unlikely 
to  be  affected  by  treatment  because  the  two  SNPs  we  identified, 
rs3 74121 1  and  rs43 66464,  were  not  associated  with  tumor  character¬ 
istics  (stage  and  Gleason  score),  the  major  determinants  of  treatment. 

In  summary,  in  this  large  consortium  analysis  of  prostate  can¬ 
cer,  both  pathway  and  SNP-specific  analyses  showed  that  germline 
variations  in  IGF2-AS  gene  were  associated  with  prostate  cancer 
mortality,  independent  of  stage  and  Gleason  score  and  specific  to 
prostate  cancer.  In  contrast,  neither  genetic  polymorphisms  nor 
prediagnostic  circulating  levels  of  IGF  1  and  IGFBP3  were  asso¬ 
ciated  with  prostate  cancer  mortality.  Pathway  analysis  suggests 
that  SSTR2  may  also  play  a  role  in  prostate  cancer  progression, 
but  SNP-specific  analysis  failed  to  show  any  statistically  significant 
SNP  in  this  gene  after  gene-level  correction.  Further  research  on 
the  role  of  IGF2/IGF2-AS  and  SSTR2  is  needed. 
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Abstract 

Previous  studies  have  associated  higher  milk  intake  with  greater  prostate  cancer  (PCa)  incidence,  but  little  data  are 
available  concerning  milk  types  and  the  relation  between  milk  intake  and  risk  of  fatal  PCa.  We  investigated  the  association 
between  intake  of  dairy  products  and  the  incidence  and  survival  of  PCa  during  a  28-y  follow-up.  We  conducted  a  cohort 
study  in  the  Physicians'  Health  Study  (n  =  21,660)  and  a  survival  analysis  among  the  incident  PCa  cases  (n  =  2806). 
Information  on  dairy  product  consumption  was  collected  at  baseline.  PCa  cases  and  deaths  ( n  =  305)  were  confirmed 
during  follow-up.  The  intake  of  total  dairy  products  was  associated  with  increased  PCa  incidence  [HR  =  1 . 12  (95%  Cl:  0.93, 
1 .35);  >2.5  servings/d  vs.  #0.5  servings/d].  Skim/low-fat  milk  intake  was  positively  associated  with  risk  of  low-grade,  early 
stage,  and  screen-detected  cancers,  whereas  whole  milk  intake  was  associated  only  with  fatal  PCa  [HR  =  1.49  (95%  Cl: 
0.97, 2.28);  $237  mL/d  (1  serving/d)  vs.  rarely  consumed].  In  the  survival  analysis,  whole  milk  intake  remained  associated 
with  risk  of  progression  to  fatal  disease  after  diagnosis  [HR  =  2. 17  (95%  Cl:  1.34,  3.51)].  In  this  prospective  cohort,  higher 
intake  of  skim/low-fat  milk  was  associated  with  a  greater  risk  of  nonaggressive  PCa.  Most  importantly,  only  whole  milk 
was  consistently  associated  with  higher  incidence  of  fatal  PCa  in  the  entire  cohort  and  higher  PCa-specific  mortality  among 
cases.  These  findings  add  further  evidence  to  suggest  the  potential  role  of  dairy  products  in  the  development  and 
prognosis  of  PCa.  J.  Nutr.  143:  189-196,2013. 


Introduction 

Prostate  cancer  (PCa)13  is  one  of  the  most  common  cancers 
among  elderly  men  (1,2).  Dairy  product  intake  has  been 
associated  with  higher  risk  of  PCa  in  many  (3-9)  but  not  all 
(10-12)  studies.  In  the  Physicians'  Health  Study  (PHS),  we 
previously  reported  that  higher  intake  of  dairy  products  and 
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Physicians'  Health  Study;  PSA,  prostate-specific  antigen. 
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harvard.edu. 


dairy-derived  calcium  were  associated  with  a  greater  risk  of 
developing  incident  PCa,  based  on  11  y  of  follow-up  (9). 
Compared  with  men  consuming  #0.5  servings/d  of  dairy 
products,  those  consuming  >2.5  servings/d  had  a  34% 
increase  in  risk  of  developing  PCa  (95%  Cl:  4%,  71%).  In  2 
meta-analyses  of  the  relation  between  dairy  product  intake 
and  PCa  incidence,  one  showed  a  signifi  positive  associ¬ 
ation  (13),  whereas  the  other  reported  an  overall  null  associ¬ 
ation  (14).  Part  of  the  reason  for  this  inconsistency  could  be 
that  most  cohort  studies  (including  our  previous  report  in  the 
PHS)  and  the  2  meta-analyses  did  not  separately  evaluate 
whole  milk  and  skim/low-fat  milk.  In  addition,  most  studies 
did  not  consider  advanced  disease  or  PCa-specifi  death  as  a 
major  outcome,  partly  due  to  the  variable  duration  of  follow¬ 
up. 

In  the  present  study,  we  assessed  the  relation  between  intakes 
of  types  of  dairy  products  and  PCa  risk,  with  a  special  emphasis 
on  cases  that  were  high  grade  and  in  advanced  stages  at 
diagnosis  as  well  as  the  occurrence  of  fatal  PCa  during  a  28-y 
follow-up. 


a  2013  American  Society  for  Nutrition. 
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Participants  and  Methods 

Study  population.  The  PHS  was  a  randomized,  blinded,  and  placebo- 
controlled  trial  of  aspirin  and  6-carotene  in  the  prevention  of  heart 
disease  and  cancer  among  22,071  U.S.  male  physicians  aged  40-84  y  in 
1982  (15,16).  At  enrollment,  participants  provided  information  in  the 
enrollment  questionnaires  on  medical  history  and  several  lifestyle 
factors.  All  the  physicians  who  were  eligible  and  willing  to  participate 
were  enrolled  in  a  run-in  phase.  After  18  wk,  participants  were  sent  a 
questionnaire  asking  about  their  health  status,  side  effects  of  treatment, 
compliance,  and  willingness  to  continue  in  the  trial.  Follow-up  question¬ 
naires  were  mailed  at  6  and  12  mo  after  randomization  and  annually 
thereafter.  Participants  were  asked  to  report  newly  diagnosed  diseases, 
including  PCa.  For  this  study,  we  limited  the  study  population  to  men 
who  returned  the  run-in  questionnaires  with  relevant  abbreviated  dietary 
information.  To  reduce  the  potential  for  undiagnosed  PCa  to  influence 
diet  and  to  utilize  the  dietary  data  collected  on  the  12-mo  questionnaire, 
we  excluded  PCa  cases  diagnosed  during  the  first  year  in  the  study,  men 
with  BMI  <18.5  kg/m2  at  baseline,  and  men  without  baseline  BMI 
information.  These  exclusions  resulted  in  a  study  population  of  21,660 
men  for  analysis.  The  study  design  and  methods  used  in  this  investigation 
were  reviewed  and  approved  by  the  Institutional  Review  Board  of  Partners 
Healthcare. 

Dietary  assessment.  The  run-in  and  12-mo  questionnaires  in  the  PHS 
included  abbreviated  FFQs.  The  run-in  questionnaire  asked  about  the 
consumption  of  whole  milk,  skim/low-fat  milk,  and  cold  breakfast  cereal 
(categories:  $2  servings/d,  daily,  5—6  servings/wk,  2—4  servings/wk, 
1  serving/wk,  1—3  servings/mo,  rarely/never)  in  the  past  year.  The  12-mo 
questionnaire  asked  about  the  intake  during  the  previous  year  of  hard 
cheese  (e.g.,  American,  Cheddar)  and  ice  cream.  We  considered  these  5 
foods  to  be  the  main  contributors  to  dairy  product  intake  and  combined 
those  responses  by  servings  to  estimate  total  daily  dairy  product  intake 
(9).  Because  the  potential  effects  of  dairy  calcium  on  PCa  risk  were  of 
interest,  we  also  calculated  total  dairy  calcium  intake  from  each  dairy 
product.  Calcium  content  was  obtained  from  the  nutrient  composition 
database  of  the  USD  A  (17).  The  calcium  content  per  serving  (as  weights 
in  the  total  calcium  consumption)  is  as  follows:  whole  milk  (1  serving  = 
237  mL),  276  mg;  skim/low-fat  milk  (1  serving  =  237  mL),  299  mg;  ice 
cream  ( 1  serving  =  2 1 4  g,  as  in  vanilla  savor),  1 69  mg;  and  hard  cheese  ( 1 
serving  =  28  g,  as  an  average  of  American  cheese  and  Cheddar  cheese), 
173  mg.  Two  questions  about  red  meat  intake  were  also  included  in  the 
12 -mo  questionnaire,  which  asked  about  the  intake  of  beef,  pork  or  lamb 
as  a  sandwich  or  mixed  dish  (hamburger,  stew,  casserole,  lasagna,  etc.) 
and  those  as  a  main  dish  (steak,  roast,  ham,  etc.).  Daily  intake  of  red 
meat  was  calculated  as  the  sum  of  the  servings  (1  serving  =  227  g)  for 
each  of  these  2  items. 

Ascertainment  of  PCa  outcomes.  For  the  PCa  incidence  analyses,  men 
were  followed  from  the  date  when  the  12-mo  questionnaire  was  returned 
until  the  date  of  PCa  diagnosis,  date  of  death,  or  the  end  of  follow-up 
(March  9,  2010),  whichever  came  first.  For  the  PCa-specific  analyses, 
men  were  followed  from  the  date  of  PCa  diagnosis  until  the  date  of  death 
from  PCa,  date  of  death  from  other  causes,  or  March  9,  2010,  whichever 
came  first.  We  learned  of  deaths  in  the  cohort  through  notification  by 
family  members  and  postal  authorities  and  through  periodic  systematic 
searches  of  the  National  Death  Index.  Cause  of  death  was  determined  by 
an  endpoint  committee  of  3  physicians  based  on  all  available  informa¬ 
tion,  including  medical  records  and  death  certificates.  Follow-up  for 
mortality  was  at  least  97.7%  complete  and  for  morbidity,  95.3%  (18). 

Whenever  a  participant  reported  a  new  diagnosis  of  PCa,  we  requested 
hospital  records  and  pathology  reports  to  confi  the  diagnosis  and 
determine  tumor  stage,  grade,  and  other  clinical  characteristics  at  diagno¬ 
sis.  Histological  grade  was  recorded  following  the  Gleason  scoring  system 
from  the  pathology  reports.  Low-grade  tumors  were  defined  as  Gleason 
#7  and  high-grade  was  defined  as  Gleason  >7.  Clinical  stage  was  de¬ 
termined  using  the  TNM  staging  system.  Tumors  of  stage  T3  or  higher 
(T3/T4/N  1/M  1)  were  categorized  as  advanced-stage  tumors  and  tumors 
of  stage  T1  or  T2  were  defined  as  early-stage  tumors.  Cases  without 
pathologic  staging  were  classified  as  undetermined  stage  unless  there 


was  clinical  evidence  of  distant  metastases.  Because  prostate-specifi 
antigen  (PSA)  screening  has  dramatically  changed  the  clinical  presen¬ 
tation  of  the  cancer,  we  also  categorized  the  cases  into  3  groups:  pre- 
PSA  era  cases  (diagnosed  before  1990),  post-PSA  era  cases  (diagnosed 
1990  or  thereafter)  who  presented  with  prostatic  or  metastatic  symp¬ 
toms,  and  post-PSA  era  cases  detected  by  PSA  or  digital  rectal  examination 
screening. 

Statistical  analyses.  To  examine  the  association  of  dairy  products  and 
calcium  consumption  with  PCa  risk,  we  used  Cox  proportional  hazards 
regression  models  to  calculate  the  HR  and  95%  Cl,  with  the  lowest 
intake  category  as  the  reference  group.  We  categorized  the  intake  of  each 
dairy  food  into  4  groups  (rarely,  #1  serving/wk,  2—6  servings/wk,  and 
$1  serving/d).  Calcium  intake  from  dairy  products  was  categorized  into 
5  groups  by  quintiles.  Tests  for  linear  trend  were  performed  using  the 
median  intake  values  in  each  category  as  a  continuous  variable.  Beyond 
age-adjusted  models,  multivariable  models  additionally  included  terms 
for  baseline  (time  when  12-mo  questionnaire  was  returned)  cigarette 
smoking  (never,  past,  or  current  smoker),  vigorous  exercise  (exercise 
vigorously  to  sweat  more  than  twice  per  week  or  not),  alcohol  intake 
(drink  alcoholic  beverages  every  day  or  not),  race  (Caucasian  or  non- 
Caucasian),  BMI  (<25.0,  25.0—29.9,  or  $30.0  kg/m2),  diabetes  status 
(yes  or  no),  red  meat  consumption  (servings/week),  and  assignment  in 
the  original  trial  (active  treatment  or  placebo  for  aspirin  and  6-carotene). 
In  addition,  the  models  for  whole  milk  and  skim/low-fat  milk  were 
mutually  adjusted  for  each  other. 

The  abbreviated  FFQs  in  the  PHS  were  not  comprehensive;  thus,  we 
were  unable  to  calculate  and  adjust  for  total  energy  intake  directly.  To 
minimize  the  potential  confounding  due  to  total  energy  intake,  we 
calculated  total  energy  intake  using  only  the  food  items  that  were 
recorded  in  the  run-in  and  12-mo  questionnaires.  These  food  items 
included  13  types  of  fruits  and  vegetables,  5  types  of  dairy  foods 
investigated  in  this  study,  eggs,  chicken,  beef,  4  types  of  fish  and  seafood, 
cookies,  chips,  nuts,  and  fried  foods.  Under  similar  situations,  previous 
studies  used  food  scores  by  summing  up  servings  of  all  recorded  food 
items  (9,19).  In  this  study,  we  weighted  the  servings  of  recorded  food 
items  with  total  calorie  per  serving  of  each  individual  item  to  better 
emulate  total  energy  intake  calculated  from  comprehensive  FFQs. 

Separate  multivariable  models  for  PCa  incidence  were  fit  for 
subgroups  of  cancer  according  to  Gleason  grade,  clinical  stage,  and 
disease  presentation  at  diagnosis,  and  disease  fatality  during  follow-up. 
We  then  modeled  the  relation  between  dairy  product  and  PCa-specific 
mortality  among  cases  using  the  Cox  proportional  hazard  regression 
model.  Besides  the  age-  and  multivariable-adjusted  model  [including  the 
same  set  of  covariates  as  in  the  incidence  model  and  stage  of  tumor  (T3/ 
T4/N1/M1  or  T1/T2)  and  Gleason  score  (>7  or  #7)],  we  further 
stratified  the  analyses  by  disease  presentation  at  diagnosis  (pre-PSA  era 
presented,  post-PSA  era  presented  by  symptom,  and  post-PSA  era 
presented  by  screening).  To  account  for  potential  false  positives  due  to 
multiple  comparisons,  we  calculated  the  false-discovery  rate  (FDR)  by 
incorporating  all  P  values  from  multiple  tests  performed  for  the  linear 
trends.  The  FDR  statistics  were  obtained  for  each  P  value,  and  FDR 
statistics  with  q  <  0.05  were  considered  significant  (20).  All  analyses 
were  performed  in  SAS  version  9.3  (SAS  Institute).  All  P  values  are  2- 
sided. 


Results 

We  confirmed  2806  incident  cases  of  PCa  diagnosed  among 
21,660  men  in  470,612  person-years  through  2010.  The 
baseline  characteristics  of  the  study  population  by  categories 
of  dairy  product  intake  are  presented  in  Table  1.  Men  who 
consumed  more  dairy  products  tended  to  be  older,  smoked  less, 
drank  less  alcohol,  exercised  more,  and  were  more  likely  to  be 
Caucasian  and  diabetic.  When  stratified  by  type  of  milk,  the  data 
showed  that  men  who  consumed  more  skim/low-fat  milk  tended 
to  smoke  less,  drink  less  alcohol,  and  exercise  more  and  were 
more  likely  to  be  Caucasian,  whereas  men  who  consumed  more 
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TABLE  1  Baseline  characteristics  by  category  of  baseline  dairy  product  intake  in  the  PHS  (n  =  21,660)' 


Dairy  product  intake,2  servings 

Whole  milk,  servings 

Skim/low-fat  milk,  servings 

#0.5/d 

(n  =3446) 

.0.5-1. 0/d 

(n  =  3878) 

.  1.0-1.5/d 

(n  =  4527) 

.  1.5-2.5/d 

(n  =  6390) 

.2. 5/d 

(n  =  3302) 

P 

#l/wk 

(n=  16,618) 

$2/wk 

(n  =  4207) 

P 

#l/wk  $2/wk 

(n=  11,834)  (n  =  9186) 

P 

Agey 

52.3  6  8.8 

52.7  6  9.0 

53.4  6  9.4 

54.3  69.7 

55.1610.0  ,0.001 

53.069.1 

55.5  610.2 

,0.001 

53.4  6  9.4 

53.6  69.4 

0.12 

BMI,  % 

,0.001 

0.08 

0.004 

Normal  weight 

56 

55 

56 

60 

61 

58 

56 

57 

59 

Overweight 

40 

41 

40 

36 

35 

38 

39 

39 

37 

Obese 

4 

4 

4 

4 

4 

4 

4 

4 

4 

Caucasian,  % 

85 

91 

93 

95 

96 

,0.001 

93 

89 

,0.001 

90 

95 

,0.001 

Diabetes,  % 

1.6 

1.4 

1.5 

2.4 

2.6 

,0.001 

1.7 

2.6 

,0.001 

1.6 

2.3 

,0.001 

Smoking,  % 

,0.001 

,0.001 

,0.001 

Never 

46 

47 

49 

53 

56 

51 

48 

48 

54 

Former 

41 

41 

40 

39 

35 

40 

37 

41 

37 

Current 

13 

12 

11 

8 

9 

9 

15 

12 

8 

Frequent  drinker,3  % 

29 

25 

24 

24 

20 

,0.001 

24 

24 

0.97 

26 

21 

,0.001 

Vigorous  exercise,4  % 

48 

52 

54 

56 

59 

,0.001 

55 

50 

,0.001 

51 

58 

,0.001 

Red  meat  intake,5  servings/wk 

0.6  6  0.4 

0.7  6  0.4 

0.7  6  0.4 

0.7  6  0.4 

0.8  6  0.5 

,0.001 

0.7  6  0.4 

0.8  6  0.5 

,0.001 

0.7  6  0.5 

0.7  6  0.4 

,0.001 

1  Values  are  percentage  or  mean  6  SE.  PHS,  Physicians'  Health  Study. 

2  Based  on  the  consumption  of  5  major  dairy  foods  (whole  milk,  skim/low-fat  milk,  hard  cheese,  ice  cream,  and  cold  breakfast  cereal)  assessed  from  1982  to  1984.  One  serving  of 
whole  milk,  skim/low-fat  milk,  or  cold  breakfast  cereal  =  237  mL;  1  serving  of  ice  cream  =  214  g;  1  serving  of  hard  cheese  =  28  g. 

3  Frequent  drinker  was  defined  as  someone  who  drinks  alcoholic  beverages  every  day. 

4  Vigorous  exercise  was  defined  as  to  exercise  vigorously  to  a  sweat  more  than  twice  per  week. 

5  1  serving  of  red  meat  =  227  g. 


whole  milk  tended  to  be  current  smokers,  exercise  less,  and  less 
likely  to  be  Caucasian. 

Total  dairy  food  intake  was  marginally  associated  with 
overall  PCa  risk.  In  multivariable-adjusted  analyses,  men  in  the 
highest  category  of  total  dairy  foods  had  a  12%  (95%  Cl:  27%, 
35%)  higher  risk  to  develop  PCa  than  men  in  the  lowest  intake 
category  (P-trend  =  0.06)  (Table  2).  For  individual  dairy  foods, 
skim/low-fat  milk  had  the  strongest  association  with  PCa 
incidence:  the  multivariable-adjusted  HR  was  1.19  (95%  Cl: 
1.06,  1.33;  P-trend  =  0.001),  comparing  the  highest  [$237  mL/d 
(1  serving/d)]  with  the  lowest  (rarely  consumed)  intake  category. 
In  contrast,  whole  milk,  hard  cheese,  ice  cream,  and  cold 
breakfast  cereal  intakes  were  not  significantly  associated  with 
overall  risk  of  PCa  incidence.  Calcium  from  dairy  foods  was 
marginally  associated  with  PCa  incidence  (P-trend  =  0.07). 

We  next  examined  the  association  of  total  daily  products, 
whole  milk,  and  skim/low-fat  milk  with  special  attention  to 
cancer  subtypes  and  the  timing  of  diagnosis  (i.e.,  1982-1989, 
pre-PSA  era  vs.  1990-2010,  post-PSA  era)  (Table  3).  We  found 
that  higher  intake  of  skim/low-fat  milk  was  mainly  associated 
with  a  higher  risk  of  low-grade,  early-stage,  and  screen-detected 
disease;  comparing  the  highest  with  the  lowest  intake  category, 
the  HR  were  1.20  for  low-grade  cases  (95%  Cl:  1.06,  1.37),  1.19 
for  early-stage  cases  (95%  Cl:  1.04, 1.35),  and  1.21  forpost-PSA 
era  cases  detected  by  screening  (95%  Cl:  1 .02,  1 .43)  (P-trend  # 
0.01  for  all  the  subgroup  analyses).  In  contrast,  for  risk  of  fatal 
PCa,  whole  milk  was  the  only  dairy  food  that  had  a  positive 
association  [HR  =  1.49  (95%  Cl:  0.97,  2.28);  P-trend  =  0.01], 
This  association  was  independent  of  age,  cigarette  smoking 
status,  BMI,  alcohol  intake,  vigorous  physical  activity,  diabetes 
status,  red  meat  consumption,  and  total  energy  intake  from 
recorded  food  items. 

Finally,  among  all  the  PCa  cases,  we  conducted  a  survival 
analysis  to  evaluate  the  associations  of  prediagnostic  dairy  food 
intake  with  risk  of  progression  to  fatal  PCa  after  initial  diagnosis 
and  found  that  whole  milk  was  the  only  daily  food  that  was 
significantly  associated  with  an  increased  risk  of  PCa-specific 


mortality  (Table  4).  Compared  with  nondrinkers  of  whole 
milk,  the  multivariable-adjusted  HR  was  2.17  (95%  Cl:  1.34, 
3.51;  P-trend  <  0.001)  for  those  who  consumed  $237  mL/d 
(1  serving/d).  A  stratifi  analysis  on  age  at  diagnosis  showed 
that  high  intake  of  whole  milk  was  signifi  associated 

with  risk  of  progression  to  fatal  PCa  in  both  old  and  young  age 
groups,  except  that  there  tended  to  be  a  J-shaped  relation  in 
the  older  group  (data  not  shown).  In  a  stratifi  analysis  on 
the  presentation  of  disease,  we  found  that,  among  post-PSA 
era  cases  presented  by  screening,  whole  milk  intake  was 
associated  with  PCa  deaths,  although  the  q  value  was  not 
signifi  [HR  =  1.82  (95%  Cl:  0.69,  4.84);  P-trend  =  0.07], 
The  associations  with  skim/low-fat  milk,  however,  were  not 
signifi  in  any  of  the  substrata  by  PSA  era  and  screening. 


Discussion 

In  this  study,  we  confirmed  and  extended  our  previous  findings 
that  total  daily  product  intake  and  calcium  from  dairy  foods 
were  positively  associated  with  overall  risk  of  PCa.  Admittedly, 
the  dairy  variables  in  our  study  did  not  capture  all  dairy  product 
intake  (did  not  include  information  on  intakes  of  yogurt,  cream, 
butter,  etc.).  However,  according  to  data  from  the  NHANES, 
milk  and  cheese  intakes  can  account  for  ;98%  of  total  dairy 
product  intake  (21).  Thus,  our  data  on  available  dairy  food 
items  sufficiently  represented  the  total  daily  product  intake  in 
our  population.  The  magnitude  of  the  overall  association 
between  total  dairy  product  intake  and  the  risk  of  incident 
PCa  [HR  =  1.12  (95%  Cl:  0.93,  1.35)]  in  this  study,  however, 
was  weaker  than  in  our  previous  report  [RR  =  1.34  (95%  Cl: 
1.04,  1.71)].  Because  the  current  analysis  had  a  much  larger 
sample  size  (2806  cases  vs.  1012  cases)  and  an  additional  15  y  of 
follow-up,  these  allowed  us  to  specifically  evaluate  subtypes  of 
dairy  products  and  by  subtypes  of  PCa,  cancer  diagnosed  before 
vs.  in  the  PSA  era,  mode  of  diagnosis,  and  cancer-specific 
mortality  (9).  We  found  that  skim/low-fat  milk  intake  were 
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TABLE  2  HR  estimates  for  PCa  by  intake  of  dairy  product  and  dairy  calcium  in  the  PHS  ( n  =  21,660)' 


Category  1 

Category  2 

Category  3 

Category  4 

Category  5 

P-trend2 

All  dairy  food3 

Cases/person-years 

388/76,216 

446/86,740 

586/98,871 

910/137,667 

458/69,738 

Age-adjusted 

1.00 

1.00(0.88,1.15) 

1.11  (0.98,1.26) 

1.19(1.06, 1.34) 

1.15(1.00,1.31) 

0.0034 

Multivariable-adjusted5 

1.00 

0.96(0.83,1.11) 

1.07(0.93,1.23) 

1.15(0.99, 1.32) 

1.12(0.93,1.35) 

0.06 

Whole  milk6 

Cases/person-years 

Age-adjusted 

1674/279,675 

1.00 

504/86,554 

0.97(0.88,1.08) 

273/47,723 

0.89(0.78,1.01) 

244/39,924 
0.89(0.78, 1.02) 

0.04 

Multivariable-adjusted5 

1.00 

1.02(0.92,1.13) 

0.93  (0.81,1.07) 

0.95  (0.81,1.10) 

0.32 

Skim/low-fat  milk6 

Cases/person-years 

Age-adjusted 

895/160,367 

1.00 

531/98,250 
1.05  (0.94,1.17) 

579/94,591 

1.17(1.05,1.29) 

724/104,959 
1.21(1.10, 1.34) 

© 

o 

o 

■t*. 

Multivariable-adjusted5 

1.00 

1.02(0.91,1.14) 

1.12(1.00,1.25) 

1.19(1.06,1.33) 

0.0014 

Hard  cheese6 

Cases/person-years 

Age-adjusted 

197/35,560 

1.00 

1207/208,462 
1.05  (0.90,1.22) 

1175/190,531 

1.12(0.96,1.30) 

178/28,270 
1.10(0.90, 1.35) 

0.14 

Multivariable-adjusted5 

1.00 

1.01(0.87,1.18) 

1.07(0.91, 1.25) 

1.05  (0.85,1.30) 

0.32 

Ice  cream6 

Cases/person-years 

Age-adjusted 

455/75,120 

1.00 

1415/251,406 

0.96(0.86,1.06) 

805/124,783 

1.06(0.95,1.19) 

84/12,177 
1.05(0.83, 1.32) 

0.06 

Multivariable-adjusted5 

1.00 

0.95  (0.85,1.06) 

1.02(0.90,1.15) 

1.03  (0.80,1.32) 

0.26 

Cold  breakfast  cereal6 

Cases/person-years 

Age-adjusted 

743/131,310 

1.00 

654/120,759 

0.96(0.86,1.06) 

678/1 12,540 
1.02(0.92,1.13) 

679/98,469 
1.11(1.00, 1.23) 

0.014 

Multivariable-adjusted5 

1.00 

0.95  (0.85,1.06) 

1.00(0.88,1.12) 

1.06(0.93,1.22) 

0.17 

Calcium  from  dairy  food7 

Cases/person-years 

487/95,147 

516/95,489 

578/93,334 

598/92,688 

609/91,575 

Age-adjusted 

1.00 

1.04(0.92,1.18) 

1.15(1.02,1.30) 

1.16(1.03, 1.31) 

1.17(1.03,1.31) 

0.0044 

Multivariable-adjusted5 

1.00 

1.01(0.89,1.15) 

1.12(0.98,1.28) 

1.12(0.97,1.30) 

1.14(0.97,1.34) 

0.07 

1  Values  are  HR  (95%  Cl).  FDR,  false-discovery  rate;  PCa,  prostate  cancer;  PHS,  Physicians'  Health  Study. 

2  Calculated  in  a  separate  regression  model  with  the  median  intake  levels  in  each  category  as  a  continuous  variable. 

3  Based  on  the  consumption  of  5  major  dairy  foods  (whole  milk,  skim/low- fat  milk,  hard  cheese,  ice  cream,  and  cold  breakfast  cereal) 
assessed  from  1982  to  1984.  The  5  intake  level  groups  are:  #0.5  servings/d,  .0.5— 1.0  serving/d,  .1.0-1. 5  servings/d,  .1. 5-2.5  servings/d, 
and  .2.5  servings/d.  One  serving  of  whole  milk,  skim/low-fat  milk,  or  cold  breakfast  cereal  =  237  mL;  1  serving  of  ice  cream  =  214  g; 
1  serving  of  hard  cheese  =  28  g. 

4FDR  ,  0.05. 

5  Adjusted  for  baseline  measures  of  age  (y),  cigarette  smoking  (never,  past,  current),  vigorous  exercise  (exercise  vigorously  to  a  sweat  more 
than  twice  per  week  or  not),  alcohol  intake  (drink  alcoholic  beverages  every  day  or  not),  race  (Caucasian,  non-Caucasian),  BMI  (normal 
weight,  overweight,  obese),  baseline  diabetes  status  (yes,  no),  red  meat  consumption  (servings/wk),  total  energy  intake  from  recorded  food 
items  (kcal),  assignment  in  the  original  aspirin  trial  (treatment,  placebo),  and  assignment  in  the  original  6-carotene  trial  (treatment,  placebo). 
In  addition,  the  models  for  whole  milk  and  skim/low-fat  milk  were  mutually  adjusted  for  each  other  (rarely,  #1  serving/wk,  2-6  servings/wk, 
and  $1  serving/d). 

6  The  4  intake  level  groups  were:  rarely,  #1  serving/wk,  2—6  servings/wk,  and  $  1  serving/d. 

7  The  5  intake  level  groups  were  categorized  according  to  quintiles. 

related  to  a  higher  risk  of  nonaggressive  disease  (low-grade, 
early-stage,  and  screen-detected  cases),  whereas  whole  milk  intake 
was  associated  with  a  higher  risk  of  fatal  PCa  and,  among  all  the 
cases,  with  a  higher  risk  of  progression  to  fatal  PCa. 

The  positive  association  between  dairy  product  intake  and 
PCa  has  been  reported  in  several  studies,  including  the  European 
Prospective  Investigation  into  Cancer  and  Nutrition  (22)  and 
studies  from  Canada  (23)  and  Japan  (4).  These  data  raised 
concerns  regarding  whether  dairy  should  be  recommended  as 
part  of  a  healthy  diet  for  aging  men  (24,25).  However,  the  results 
of  2  meta-analyses  of  the  relation  between  dairy  product  intake 
and  PCa  provided  conflicting  conclusions:  one  showed  a 
significant  positive  association  (13)  and  the  other  (supported 
by  the  National  Dairy  Council)  showed  an  overall  null  associ¬ 
ation  (14).  Part  of  the  reason  for  this  inconsistency  could  be  a 
lack  of  detailed  data  for  the  effect  of  whole  compared  with  skim/ 
low-fat  milk  and  their  impact  on  high-risk  disease  or  PCa- 
specific  death. 


Our  finding  that  the  strongest  association  with  total  dairy 
products  was  in  the  pre-PSA  era  was  consistent  with  findings  of 
Rodriguez  et  al.  (26).  We  observed  a  significant  positive  associ¬ 
ation  of  skim/low-fat  milk  with  overall  PCa  risk.  These  results  are 
consistent  with  previous  studies  (6,27).  Few  studies  specifically 
evaluated  high-risk  PCa.  Park  et  al.  (28)  observed  that  skim  milk, 
but  not  other  dairy  foods,  was  associated  with  a  nonsignificantly 
increased  risk  of  advanced  PCa.  The  null  effect  of  whole  milk  on 
overall  PCa  risk  is  likely  due  to  the  fact  that  the  whole  milk 
drinkers  accounted  for  only  a  small  portion  of  all  milk  drinkers. 
Thus,  the  associations  of  whole  milk  with  the  nonfatal  cases,  if 
any,  were  not  large  enough  to  be  detected  with  a  limited  number 
of  cases,  which  may  have  driven  the  overall  effect. 

The  commonly  accepted  risk  factors  for  incident  PCa  are 
older  age,  a  family  history  of  PCa,  and  being  African  American 
(29).  However,  there  is  no  consensus  about  risk  factors  for  fatal 
PCa  beyond  clinical  characteristics  such  as  PSA  at  diagnosis, 
Gleason  grade,  and  clinical  stage.  Identifying  modifiable  risk 
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TABLE  3  Multivariable-adjusted  HR  estimates  for  categories  of  PCa  cases  by  intake  of  dairy  product  in 
the  PHS  O  =  2 1,660) 1,2 


Selected  case3 

Category  1 

Category  2 

Category  3 

Category  4 

Category  5 

P-trend4 

Dairy  product5 

High  grade 

1.00 

1.04(0.69,1.58) 

0.77(0.50,1.20) 

1.09(0.71, 1.68) 

1.04(0.60,1.80) 

0.64 

Low  grade 

1.00 

0.95(0.81,1.12) 

1.11  (0.95,1.30) 

1.13(0.95, 1.33) 

1.13(0.91,1.39) 

0.12 

Advanced 

1.00 

0.92(0.59, 1.46) 

0.79(0.50,1.27) 

0.92(0.57, 1.48) 

0.68(0.36,  1.27) 

0.35 

Localized 

1.00 

0.94(0.80,1.11) 

1.09(0.93,1.29) 

1.11(0.94, 1.32) 

1.13(0.91,1.39) 

0.13 

Fatal 

1.00 

1.19(0.68,2.06) 

1.81  (1.08,3.02) 

2.14(1.26,3.64) 

1.73  (0.90,3.35) 

0.05 

Pre-PSA 

1.00 

1.70(0.95,3.05) 

1.77(1.00,3.13) 

1.82(1.01,3.27) 

2.12(1.07,4.19) 

0.10 

Post-PSA  (symptom) 

1.00 

1.44(0.78,2.68) 

1.25(0.66,2.34) 

1.83(0.99,3.40) 

1.61  (0.76,3.40) 

0.19 

Post-PSA  (screening) 

1.00 

0.83(0.67,1.03) 

1.10(0.90, 1.34) 

1.04(0.84, 1.28) 

0.99(0.75,1.30) 

0.64 

Whole  milk6 

High  grade 

1.00 

0.69(0.48,1.00) 

1.29(0.91,1.84) 

0.78(0.49, 1.25) 

0.81 

Low  grade 

1.00 

1.09(0.97,1.23) 

0.86(0.73,1.01) 

0.91(0.76, 1.10) 

0.10 

Advanced 

1.00 

0.89(0.61,1.29) 

1.04(0.68,1.61) 

0.83(0.49, 1.41) 

0.63 

Localized 

1.00 

1.06(0.94,1.19) 

0.87(0.74, 1.03) 

0.89(0.74, 1.07) 

0.08 

Fatal 

1.00 

0.89(0.60,1.31) 

1.77(1.23,2.54) 

1.49(0.97,2.28) 

0.017 

Pre-PSA 

1.00 

1.29(0.89,1.86) 

1.51(1.00,2.27) 

1.35(0.85,2.15) 

0.15 

Post-PSA  (symptom) 

1.00 

1.22(0.80,1.86) 

1.19(0.71,1.99) 

1.29(0.76,2.21) 

0.38 

Post-PSA  (screening) 

1.00 

1.00(0.86,1.17) 

0.74(0.59,0.93) 

0.73  (0.57, 0.94) 

0.0027 

Skim/low-fat  milk6 

High  grade 

1.00 

1.11(0.79,1.56) 

1.07(0.76,1.51) 

1.19(0.85, 1.67) 

0.39 

Low  grade 

1.00 

0.99(0.87,1.13) 

1.18(1.04, 1.35) 

1.20(1.06, 1.37) 

0.0017 

Advanced 

1.00 

0.94(0.64,1.38) 

1.02(0.69,1.49) 

0.99(0.67, 1.45) 

0.96 

Localized 

1.00 

0.99(0.86,1.13) 

1.18(1.04, 1.35) 

1.19(1.04,1.35) 

0.0047 

Fatal 

1.00 

1.04(0.72,1.51) 

1.01(0.69,1.47) 

1.04(0.71, 1.51) 

0.91 

Pre-PSA 

1.00 

1.38(0.92,2.07) 

1.69(1.15,2.48) 

1.43(0.97,2.12) 

0.11 

Post-PSA  (symptom) 

1.00 

0.84(0.52,1.35) 

1.02(0.64, 1.62) 

1.22(0.79, 1.88) 

0.23 

Post-PSA  (screening) 

1.00 

0.98(0.83,1.17) 

1.20(1.01,1.42) 

1.21(1.02, 1.43) 

0.017 

1  Values  are  HR  (95%  Cl).  FDR,  false-discovery  rate;  PCa,  prostate  cancer;  PHS.  Physicians'  Health  Study;  PSA,  prostate-specific  antigen. 

2  Adjusted  for  baseline  measures  of  age  (y ),  cigarette  smoking  (never,  past,  current),  vigorous  exercise  (exercise  vigorously  to  a  sweat  more 
than  twice  per  week  or  not),  alcohol  intake  (drink  alcoholic  beverages  every  day  or  not),  race  (Caucasian,  non-Caucasian),  BMI  (normal 
weight,  overweight,  obese),  baseline  diabetes  status  (yes,  no),  and  red  meat  consumption  (servings/wk),  total  energy  intake  from  recorded 
food  items  (kcal),  assignment  in  the  original  aspirin  trial  (treatment,  placebo),  and  assignment  in  the  original  b-carotene  trial  (treatment, 
placebo).  In  addition,  the  models  for  whole  milk  and  skim/low-fat  milk  were  mutually  adjusted  for  each  other  (rarely,  #1  serving/wk,  2—6 
servings/wk,  and  $1  serving/d). 

3  High  grade  (n  =  317):  Gleason  .7;  low  grade  (n  =  2105):  Gleason  #7;  advanced  (n  =  272):  T3/T4/N1/M1;  localized  (n  =  2016):  T1/T2;  fatal 

( n  =  305):  died  of  PCa;  pre-PSA  era  </t  =  274):  diagnosed  before  1990;  post-PSA  era:  diagnosed  after  1990;  presented  by  symptom  (n  =  192): 
presented  by  prostate-related  symptoms  or  metastases;  presented  by  screening  (n  =  1233):  presented  by  PSA  test  screening  or  digital 
rectal  examination; 

4  Calculated  in  a  separate  regression  model  with  the  median  intake  in  each  category  as  a  continuous  variable. 

5  Based  on  baseline  consumption  of  5  major  dairy  foods  (whole  milk,  skim/low-fat  milk,  hard  cheese,  ice  cream,  and  cold  breakfast  cereal). 
The  5  intake  level  groups  are:  #0.5  servings/d,  .0.5— 1.0  serving/d,  .1.0-1 .5  servings/d,  .1.5— 2.5  servings/d,  and  .2.5  servings/d.  One 
serving  of  whole  milk,  skim/low-fat  milk,  or  cold  breakfast  cereal  =  237  mL;  1  serving  of  ice  cream  =  2 14  g;  1  serving  of  hard  cheese  =  28  g. 

6  The  4  intake  level  groups  were:  rarely,  #1  serving/wk,  2-6  servings/wk,  and  $1  serving/d. 

7FDR  ,  0.05. 


factors  for  fatal  PCa  is  critical,  because  widespread  PSA  testing 
in  the  US  is  likely  to  detect  and  overtreat  a  large  number  of  men 
with  indolent  cancer  (30).  A  major  challenge  in  PCa  research  is 
distinguishing  risk  factors  for  aggressive  PCa  from  indolent 
disease  to  reduce  overtreatment.  Our  results  showed  that  higher 
intakes  of  whole-fat  milk  predispose  men  to  a  higher  risk  of 
developing  fatal  PCa  and,  once  they  had  the  cancer,  a  higher  risk 
of  progression  to  fatal  disease.  This  association  was  unlikely 
confounded  by  skim/low-fat  milk  according  to  our  analysis. 

Given  that  dairy  product  intakes  were  assessed  years  before 
cancer  diagnosis,  our  findings  need  to  be  further  confirmed  by 
cohorts  with  more  detailed  dietary  information,  especially 
dietaiy  intakes  at  or  around  the  time  of  the  cancer  diagnosis. 
In  the  Health  Professionals  Follow-up  Study  cohort,  Chan 
et  al.  (31)  found  that  men  in  the  highest  compared  with  the 
lowest  quartile  of  milk  consumption  after  diagnosis  had  a 


nonsignifi  elevated  risk  of  fatal  PCa  [HR  =  1.30  (95% 

Cl:  0.93,  1.83)],  but  this  study  did  not  examine  specifi  types 
of  dairy  food.  Another  explanation  of  the  association  bet¬ 
ween  whole  milk  intake  and  fatal  PCa  risk  is  also  possible:  it  is 
likely  that  men  who  drink  more  whole  milk  are  less  likely  to 
be  screened  and  therefore  are  diagnosed  at  a  later  stage  and  are 
at  a  higher  risk  for  fatal  disease.  In  the  survival  analysis,  we 
adjusted  for  Gleason  score  and  stage  of  tumor  at  diagnoses. 
The  association  remained  significant  after  the  adjustment,  which 
supports  that  the  association  was  not  due  to  confounding  by 
screening.  However,  further  data  on  PSA  screening  intensity  are 
needed  to  justify  or  refute  this  explanation. 

In  our  study,  the  average  interval  between  dairy  product 
intake  assessment  and  PCa  diagnosis  was  14  y,  yielding  possible 
exposure  misclassification.  This  is  of  particular  concern  for  the 
analysis  of  PCa  survival,  because  patients  may  have  changed 
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TABLE  4  HR  estimates  of  PCa  death  by  prediagnostic  intake  of  dairy  product  and  dairy  calcium  in  PCa 
cases  in  the  PHS  ( n  =  2806)1 


Category  1 

Category  2 

Category  3 

Category  4 

Category  5 

P-trend2 

All  dairy  food3 

Deaths/person-years 

27/3601 

45/4012 

74/5222 

115/8503 

43/4416 

Age-adjusted 

1.00 

1.50(0.93,2.41) 

1.83(1.18,2.85) 

1.73(1.14,2.63) 

1.22(0.75,1.97) 

0.75 

Multivariable-adjusted4,5 

1.00 

0.97(0.53,1.78) 

2.23(1.26,3.92) 

1.87(1.04,3.37) 

1.71  (0.82,3.58) 

0.16 

Pre-PSA 

1.00 

1.16(0.33,4.05) 

2.20(0.66,7.29) 

1.02(0.29,3.51) 

2.15(0.53,8.79) 

0.76 

Post-PSA  (screening) 

1.00 

0.74(0.27,2.03) 

1.09(0.45,2.64) 

1.24(0.50,3.06) 

0.93(0.26,3.36) 

0.80 

Whole  milk6 

Deaths/person-years 

Age-adjusted 

161/15,350 

1.00 

43/4860 

0.85(0.60, 1.18) 

49/2504 

1.81  (1.32,2.49) 

43/2092 

1.85(1.32,2.59) 

,0.0017 

Multivariable-adjusted4,5 

1.00 

0.73(0.47,1.13) 

1.79(1.15,2.79) 

2.17(1.34,3.51) 

,0.0017 

Pre-PSA 

1.00 

0.77(0.36,1.63) 

0.68(0.23,1.98) 

1.21  (0.45,3.24) 

0.67 

Post-PSA  (screening) 

1.00 

0.84(0.40, 1.79) 

2.26(1.07,4.78) 

1.82  (0.69,4.84) 

0.07 

Skim/low-fat  milk6 

Deaths/person-years 

Age-adjusted 

115/8106 

1.00 

58/4856 

0.89(0.65,1.22) 

53/5493 

0.71(0.51,0.98) 

68/6789 

0.70(0.52,0.94) 

0.027 

Multivariable-adjusted4,5 

1.00 

1.01(0.67,1.52) 

0.87(0.56,  1.36) 

1.02(0.67,1.56) 

0.98 

Pre-PSA 

1.00 

1.18(0.53,2.59) 

0.61  (0.24, 1.56) 

0.77(0.34,1.76) 

0.38 

Post-PSA  (screening) 

1.00 

1.48(0.71,3.11) 

1.34(0.66,2.73) 

1.22(0.54,2.73) 

0.79 

Calcium  from  dairy  food8 

Deaths/person-years 

41/4412 

52/4755 

73/5174 

74/5535 

64/5879 

Age-adjusted 

1.00 

1.18(0.79,1.78) 

1.45  (0.99,2.13) 

1.36(0.93,1.99) 

1.11(0.75,1.64) 

0.75 

Multivariable-adjusted4,5 

1.00 

1.06(0.63,1.79) 

1.70(1.00,2.89) 

1.64(0.94,2.89) 

1.71(0.91,3.21) 

0.09 

Pre-PSA 

1.00 

0.83(0.33,2.08) 

1.30(0.49,3.48) 

0.63  (0.23,1.77) 

1.30(0.41,4.19) 

0.99 

Post-PSA  (screening) 

1.00 

1.09(0.44,2.69) 

1.08(0.45,2.62) 

1.63  (0.65,4.10) 

1.22(0.41,3.65) 

0.59 

1  Values  are  HR  (95%  Cl).  FDR,  false-discovery  rate;  PCa,  prostate  cancer;  PHS,  Physicians'  Health  Study;  PSA,  prostate-specific  antigen. 

2  Calculated  in  a  separate  regression  model  with  the  median  intake  levels  in  each  category  as  a  continuous  variable. 

3  Based  on  the  consumption  of  5  major  dairy  foods  (whole  milk,  skim/low-fat  milk,  hard  cheese,  ice  cream,  and  cold  breakfast  cereal) 
assessed  from  1982  to  1984.  The  5  intake  level  groups  are:  #0.5  servings/d,  .0.5— 1.0  serving/d,  .1.0-1. 5  servings/d,  .1.5— 2.5  servings/d, 
and  .2.5  servings/d.  One  serving  of  whole  milk,  skim/low-fat  milk,  or  cold  breakfast  cereal  =  237  mL;  1  serving  of  ice  cream  =  214  g; 
1  serving  of  hard  cheese  =  28  g. 

4  Adjusted  for  baseline  measures  of  age  at  diagnosis  (y),  cigarette  smoking  (never,  past,  current),  vigorous  exercise  (exercise  vigorously  to  a 
sweat  more  than  twice  per  week  or  not),  alcohol  intake  (drink  alcoholic  beverages  every  day  or  not),  race  (Caucasian,  non-Caucasian),  BMI 
(normal  weight,  overweight,  obese),  baseline  diabetes  status  (yes,  no),  red  meat  consumption  (servings/wk),  Gleason  score  (.7,  $7),  stage 
of  tumor  (T3/T4/N1/M1,  T1/T2),  total  energy  intake  from  recorded  food  items  (kcal),  assignment  in  the  original  aspirin  trial  (treatment, 
placebo),  and  assignment  in  the  original  6-carotene  trial  (treatment,  placebo).  In  addition,  the  models  for  whole  milk  and  skim/low-fat  milk 
were  mutually  adjusted  for  each  other  (rarely,  #1  serving/wk,  2—6  servings/wk,  and  SI  serving/d). 

5  Pre-PSA  era  in  =  274):  diagnosed  before  1990;  post-PSA  era:  diagnosed  after  1990;  presented  by  symptom  in  =  192):  presented  by 
prostate-related  symptoms  or  metastases  (results  not  presented  because  of  very  low  statistical  power  );  presented  by  screening  (11  =  1233): 
presented  by  PSA  test  screening  or  digital  rectal  examination. 

6  The  4  intake  level  groups  were:  rarely,  #1  serving/wk,  2—6  servings/wk,  and  $  1  serving/d. 

7  FDR  ,  0.05. 

8  The  5  intake  level  groups  were  categorized  according  to  quintiles. 


their  diet  after  diagnosis.  We  evaluated  correlations  among 
nutrients  between  the  2000  and  2004  FFQs,  comparing  men 
diagnosed  with  PCa  in  that  interval  with  those  who  remained 
free  of  PCa.  We  found  that  the  correlations  ranged  between  0.5 
and  0.7  for  all  nutrients  assessed,  including  dairy  products. 
There  were  no  obvious  trends  in  the  absolute  levels  of  intake 
between  cases  and  non-cases.  These  observations  suggest  that 
men  tended  to  keep  their  dietary  habits  after  PCa  diagnosis.  One 
advantage  of  using  prediagnostic  dietaiy  information  is  to  avoid 
confounding  by  recall  bias,  change  of  diet  due  to  disease  severity 
or  treatments,  or  other  reasons.  Recently,  Pettersson  et  al.  (32) 
found  that  in  the  Health  Professionals  Follow-up  Study,  post¬ 
diagnostic  milk  and  dairy  product  intake  was  not  significantly 
associated  with  increased  risk  of  fatal  PCa,  whereas  Torfadottir 
et  al.  (33)  found  that  milk  intake  during  adolescence,  rather  than 
in  midlife  or  currently,  was  associated  with  advanced  PCa.  One 
possibility  is  that  dairy  product  intake  in  earlier  life  may  be  more 
relevant  to  the  progression  and  mortality  of  PCa  in  later  life. 


Several  potential  mechanisms  could  explain  the  observed 
associations  of  dairy  food  (primarily  skim/low-fat  milk)  with 
overall  PCa  risk.  First,  skim/low-fat  milk  is  the  major  source  of 
dairy  calcium  and  higher  intake  might  lower  intra-cellular  1,25- 
dihydroxycholecalciferol  concentrations  and  induce  prostate 
carcinogenesis  (8,34-36).  Second,  the  association  could  be 
mediated  via  phytanic  acid,  which  may  upregulate  expression 
of  a-methylacyl-CoA  racemase  (37,38).  The  involvement  of  a- 
methylacyl-CoA  racemase  in  PCa  is  implicated  by  a  recent 
observation  (39).  Third,  the  relation  could  be  through  the  effect 
of  phosphate.  Newmark  et  al.  (40)  suggested  that  the  high  dietary 
phosphate  content  of  dairy  products  might  explain  the  risk  of 
PCa  induced  by  dairy  products,  because  the  plasma  phosphate 
concentration  can  appreciably  infl  1,25-dihydroxycho- 

lecalciferol  concentrations.  Fourth,  the  ability  of  dairy  pro¬ 
ducts  to  raise  concentrations  of  insulin-like  growth  factor  1  have 
also  been  suggested  as  a  possible  explanation  for  the  association 
(41-43).  The  association  of  whole  milk  with  fatal  PCa  and 
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PCa-specific  mortality  may  be  via  the  effects  of  dairy  fat  (primarily 
saturated  fat)  or  other  factors  (including  obesity  and  hyperin - 
sulinemia).  Whole  milk  has  an  ;40  times  higher  content  of 
saturated  fat  compared  with  skim  milk  and  the  difference  of  the 
saturated  fat  content  between  237  mL  of  whole  milk  and  skim 
milk  is  ;20%  of  its  average  daily  intake  (17).  High-fat  dairy  has 
been  positively  correlated  with  hi  gherC-peptide  concentrations, 
which  were  positively  related  to  risk  of  aggressive  PCa  (44). 

In  summary,  the  results  from  the  present  study  confi  a 
potential  role  of  dairy  products  in  PCa  risk  and  survival.  Skim/ 
low- fat  milk  dairy  products  have  been  suggested  as  being 
benefi  for  several  disease  outcomes,  including  colorectal 
cancer;  so  future  research  is  warranted  to  investigate  the 
optimal  intake  of  skim/low-fat  dairy  products.  However,  our 
results  add  further  evidence  to  suggest  that  the  intake  of  whole- 
fat  dairy  products  is  associated  with  the  risk  of  developing 
advanced  or  fatal  PCa  in  elderly  men  and  worse  survival  in  PCa 
cases.  Thus,  minimal  intake  of  whole-fat  dairy  products  may  be 
benefi  for  elderly  men,  particularly  PCa  survivors.  How¬ 
ever,  these  results  still  need  to  be  confi  in  other  male 

populations. 
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Observational  studies  have  found  an  inverse  association  between  type  2  diabetes  (T2D)  and  prostate  cancer 
(PCa),  and  genome-wide  association  studies  have  found  common  variants  near  3  loci  associated  with  both  dis¬ 
eases.  The  authors  examined  whether  a  genetic  background  that  favors  T2D  is  associated  with  risk  of  advanced 
PCa.  Data  from  the  National  Cancer  Institute’s  Breast  and  Prostate  Cancer  Cohort  Consortium,  a  genome-wide 
association  study  of  2,782  advanced  PCa  cases  and  4,458  controls,  were  used  to  evaluate  whether  individual 
single  nucleotide  polymorphisms  or  aggregations  of  these  36  T2D  susceptibility  loci  are  associated  with  PCa. 
Ten  T2D  markers  near  9  loci  ( NOTCH2 ,  ADCY5,  JAZF1,  CDKN2A/B,  TCF7L2,  KCNQ1,  MTNR1B,  FTO,  and 
HNF1B)  were  nominally  associated  with  PCa  (P  <  0.05);  the  association  for  single  nucleotide  polymorphism 
rs757210  at  the  HNF1B  locus  was  significant  when  multiple  comparisons  were  accounted  for  (adjusted 
P=  0.001).  Genetic  risk  scores  weighted  by  the  T2D  log  odds  ratio  and  multilocus  kernel  tests  also  indicated  a 
significant  relation  between  T2D  variants  and  PCa  risk.  A  mediation  analysis  of  9,065  PCa  cases  and  9,526 
controls  failed  to  produce  evidence  that  diabetes  mediates  the  association  of  the  HNF1B  locus  with  PCa  risk. 
These  data  suggest  a  shared  genetic  component  between  T2D  and  PCa  and  add  to  the  evidence  for  an  interre¬ 
lation  between  these  diseases. 

carcinoma;  diabetes  mellitus,  type  2;  genetic  predisposition  to  disease;  genetics;  genome-wide  association 
study;  humans;  polymorphism,  single  nucleotide;  prostatic  neoplasms 


Abbreviations:  BPC3,  Breast  and  Prostate  Cancer  Cohort  Consortium;  Cl,  confidence  interval;  GRS,  genetic  risk  score;  OR, 
odds  ratio;  PCa,  prostate  cancer;  SNP,  single  nucleotide  polymorphism;  T2D,  type  2  diabetes. 


Prostate  cancer  (PCa)  and  type  2  diabetes  (T2D)  are  two 
of  the  most  common  chronic  diseases  afflicting  the  US 
aging  male  population  (1,  2).  Observational  studies  have 
consistently  shown  an  apparent  inverse  association  between 
T2D  and  risk  of  PCa,  with  meta-analysis  risk  ratios  ranging 
from  0.84  to  0.91  (3,  4).  The  reduction  in  PCa  risk  has 
been  reported  to  increase  with  years  since  T2D  diagnosis, 
with  men  who  have  had  T2D  for  more  than  15  years  being 
at  a  22%  reduced  hazard  of  PCa  (5).  The  association  is 


poorly  understood,  with  one  hypothesis  suggesting  that  the 
metabolic  status  of  men  with  T2D  could  move  gradually 
from  hyperinsulinemia  to  endogenous  insulin  deficiency, 
which  could  mitigate  the  oncogenic  action  of  insulin  in  the 
prostate  (6,  7). 

Recently,  3  shared  genomic  regions  for  T2D  and  PCa 
have  been  highlighted.  The  first  region,  located  on  chromo¬ 
some  17,  is  in  intron  2  of  HNF1B,  formerly  known  as 
TCF2.  The  major  allele  A  of  rs4430796  is  positively 
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associated  with  PCa  risk  (odds  ratio  (OR)  =  1 .22)  and  in¬ 
versely  associated  with  risk  of  T2D  (OR  =  0.91)  (8-10). 
The  second  region  is  located  on  chromosome  7  near  the 
JAZF1  locus,  where  the  major  allele  G  of  rsl 0486567  is 
inversely  associated  with  risk  of  PCa  (aggressive  PCa: 
OR  =  0.89;  nonaggressive  PCa:  OR  =  0.74)  (11),  whereas 
the  minor  allele  G  of  rs864745  is  positively  associated  with 
T2D  (OR=  1.10)  (12).  THADA  is  the  third  region,  located 
on  chromosome  2,  with  the  minor  allele  A  of  rsl465618 
being  associated  with  PCa  (OR=  1.08)  (13)  and  the  major 
allele  T  of  rs7578597  associated  with  T2D  (OR  =  1.15) 
(12).  However,  the  single  nucleotide  polymorphisms 
(SNPs)  for  T2D  and  PCa  in  the  JAZF1  and  THADA 
regions  are  weakly  linked,  with  R 1  values  of  0.03  and  0.02, 
respectively.  It  is  not  clear  that  these  associations  are  driven 
by  the  same  haplotype  (14,  15). 

Stevens  et  al.  (16)  investigated  the  T2D-PCa  relation 
further  and  concluded  that  diabetic  status  did  not  mediate 
the  observed  relation  between  the  HNF1B  and  JAZF1  gene 
variants  and  PCa  risk.  In  the  Atherosclerosis  Risk  in  Com¬ 
munities  cohort,  Meyer  et  al.  (17)  examined  the  relation  of 
T2D-associated  variants  with  risk  of  PCa  and  found  that  4 
of  13  T2D  SNPs  were  nominally  associated  with  PCa, 
which  provides  additional  evidence  that  some  of  the  T2D- 
PCa  association  could  be  driven  by  shared  genetic  factors. 
Another  study  by  Pierce  et  al.  (18)  evaluated  the  ability  of 
risk  scores,  consisting  of  1 8  replicated  T2D  risk  variants,  to 
predict  PCa  risk  and  concluded  that  persons  with  increased 
genetic  susceptibility  to  T2D  have  a  reduced  risk  of  PCa. 
However,  in  a  recent  study  of  5  racial/ethnic  groups  in  the 
Multiethnic  Cohort  and  PAGE  (Population  Architecture 
using  Genomics  and  Epidemiology),  Waters  et  al.  (19) 
found  no  association  between  T2D  risk  variants,  either  in¬ 
dividually  or  in  risk  scores,  and  PCa  risk. 

With  a  large  sample  size  and  an  expanded  set  of  recently 
published  T2D  susceptibility  loci,  we  aimed  to  investigate 
whether  and  to  what  extent  individual  T2D  risk  variants  and 
aggregations  of  T2D  replicated  risk  variants  are  associated 
with  PCa  risk.  We  used  novel  approaches  to  test  both 
whether  these  risk  variants  are  inversely  associated  with 
PCa  risk  in  accordance  with  the  inverse  relation  observed 
between  T2D  and  PCa  in  observational  studies  and,  more 
generally,  whether  these  T2D  loci  are  associated  with  PCa 
risk  without  regard  to  directionality  of  association.  Addition¬ 
ally,  using  causal  inference  methods,  our  study  attempted  to 
more  definitively  investigate  the  potential  for  mediation  of 
the  effect  of  HNF1B  on  PCa  risk  through  T2D  phenotype. 

MATERIALS  AND  METHODS 

Genotyping  data  for  PCa  cases  and  controls  came  from 
the  National  Cancer  Institute’s  Breast  and  Prostate  Cancer 
Cohort  Consortium  (BPC3).  The  BPC3  is  a  consortium  of 
prospective  cohort  studies,  with  contributors  including  the 
Alpha-Tocopherol,  Beta-Carotene  Cancer  Prevention  Study 
(20),  the  American  Cancer  Society  Cancer  Prevention 
Study  II  Nutrition  Cohort  (21),  the  European  Prospective 
Investigation  into  Cancer  and  Nutrition  (22),  the  Health 
Professionals  Follow-up  Study,  the  Melbourne  Collabora¬ 
tive  Cohort  Study  (23),  the  Multiethnic  Cohort  Study  (24), 


the  Physicians’  Health  Study,  and  the  Prostate,  Lung,  Colo¬ 
rectal,  and  Ovarian  Cancer  Screening  Trial  (25).  In  total, 
9,065  PCa  cases  and  9,526  controls  comprised  the  PCa 
nested  case-control  study.  Diabetes  phenotype  was  self- 
reported  at  study  baseline,  with  data  available  for  96.7%  of 
BPC3  participants.  A  genome-wide  association  scan  was 
conducted  on  a  subset  of  2,782  European  cases  with  ad¬ 
vanced  disease  and  4,458  controls  with  European  ancestry. 
Advanced  PCa  was  defined  as  PCa  cases  that  had  either  a 
high  histologic  grade  (Gleason  score  >8)  or  extrapro static 
extension  (stage  C/D).  All  controls  were  free  of  PCa  at  the 
time  of  selection  and  were  sampled  from  the  same  cohort 
as  the  cases.  Controls  were  age-matched  to  cases,  and  study 
indicator  variables  were  used  to  adjust  for  sampling  differ¬ 
ences  between  studies.  Informed  consent  was  received  from 
all  study  participants,  and  all  study  protocols  were  reviewed 
by  the  institutional  review  boards  of  the  National  Cancer 
Institute  and  each  participating  study  center. 

A  literature  search  was  conducted  to  find  robustly  repli¬ 
cated  disease  susceptibility  loci  that  are  associated  with 
T2D  at  genome -wide  significance  levels  ( P  <5  x  10  8).  In 
total,  36  independent  autosomal  loci  associated  with  T2D 
were  identified,  and  published  T2D  risk  alleles  and  odds 
ratios  were  extracted  (9,  10,  12,  26-36). 

Individual  association  tests  were  carried  out  for  each 
T2D  SNP  with  PCa  risk  in  the  BPC3  genome-wide  associ¬ 
ation  study  (37).  Quality  control  filters  were  used  to  remove 
samples  with  heterozygosity,  underperforming  samples  or 
markers,  markers  with  genotype  frequencies  that  signifi¬ 
cantly  departed  from  Hardy- Weinberg  equilibrium,  and 
subjects  with  significant  evidence  of  non-European  ances¬ 
try  or  sample  structure.  Of  the  36  T2D  SNPs,  19  were  not 
directly  genotyped  on  the  Illumina  HumanHap610  Quad 
Arrays  (Illumina,  San  Diego,  California)  and  were  therefore 
imputed  with  MACH  (http://www.sph.umich.edu/csg/abecasis/ 
MaCH/)  (38).  MACH  references  the  HapMap  (http://hapmap. 
ncbi.nlm.nih.gov/)  CEU  population  (Utah  residents  with 
Northern  and  Western  European  ancestry  from  the  Centre 
d’Etude  du  Polymorphisme  Humain  (CEPH)  collection)  to 
infer  expected  genotype  counts  for  each  marker  locus.  MACH 
quality  scores  and  R2  values  were  more  than  0.85  and 
0.75,  respectively,  for  all  imputed  SNPs.  Logistic  regression 
models  were  used  to  test  for  T2D  SNP  associations  with  PCa 
risk.  The  number  of  T2D  risk  alleles  was  used  as  the  expo¬ 
sure,  and  adjustment  was  made  for  cohort  (indicator  vari¬ 
ables).  A  nominal  association  P  value  of  0.05  was  used  to 
assess  whether  T2D  markers  exhibited  more  significant 
associations  with  PCa  than  would  be  expected  by  chance.  Ad¬ 
ditional  binomial  and  permutation  tests  (39)  (10,000  permuta¬ 
tions)  were  carried  out  to  test  for  a  relation  in  risk  allele 
directionality  and  significant  departures  of  the  PCa  association 
statistics  from  the  null  distribution,  respectively. 

The  T2D  SNPs  were  combined  to  form  a  genetic  risk 
score  (GRS)  using  the  —score  command  in  PLINK  (40).  The 
GRS  was  calculated  in  two  ways.  The  first  method,  referred 
to  here  as  the  count  method,  involved  summing  the  number 
of  T2D  risk  alleles  at  each  locus  (0,  1,  or  2)  and  then 
summing  across  all  T2D  loci.  This  count  method  is  an  ad¬ 
ditive  model  that  weights  each  locus  equally  and  assumes 
no  gene-gene  interactions.  The  second  method,  referred  to 
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here  as  the  weighted  method,  uses  the  log  odds  ratio  of  the 
published  T2D  loci  to  weight  the  sum  of  T2D  risk  alleles  at 
each  locus  and  then  sums  across  all  T2D  loci.  The  weighted 
method  is  an  additive  model  that  weights  each  locus  in  ac¬ 
cordance  with  the  T2D  literature  and  assumes  no  gene-gene 
interactions.  The  rationale  for  weighting  is  to  create  a  score 
that  is  the  best  GRS  for  T2D  and  therefore  can  be  used  as 
an  instrument  for  testing  an  association  with  PCa.  For  each 
GRS  method,  we  included  the  GRS  as  a  predictor  in  a  lo¬ 
gistic  regression  model  with  PCa  case-control  status  as  the 
outcome,  and  we  adjusted  for  cohort  with  an  indicator  vari¬ 
able.  Cohort-specific  associations  were  also  calculated. 

Additionally,  multilocus  linear  kernel  tests  were  used  to 
assess  the  joint  relation  between  the  36  T2D  variants  and 
PCa  risk.  These  linear  models  allow  associations  of  multi¬ 
ple  genetic  loci  to  be  tested  simultaneously  with  one  test 
statistic  (41)  and  have  been  generalized  for  dichotomous 
outcomes  (42).  Unlike  the  GRS  methods,  these  tests 
require  no  prespecification  of  risk  allele  directionality  (i.e., 
that  the  risk  allele  is  associated  with  increased  risk  of  T2D 
and  decreased  risk  of  PCa). 

The  HNF1B  locus  was  the  only  T2D  locus  significantly 
associated  with  PCa  risk  after  adjustment  for  multiple  com¬ 
parisons,  so  it  was  carried  forward  for  mediation  analysis  to 
evaluate  whether  T2D  phenotype  is  a  potential  mediator  of 
the  relation  between  HNF1B  and  PCa.  We  used  an  expand¬ 
ed  set  of  data  on  9,065  PCa  cases  (including  nonaggressive 
cases)  and  9,526  controls  from  the  BPC3  (43)  with  self- 
reported  information  on  diabetes  phenotype.  Data  on 
rs7501939  at  HNF1B  were  generated  as  part  of  a  previous 
project  characterizing  known  PCa  loci;  this  SNP  is  in  high 
linkage  disequilibrium  with  rs757210  (R2  =  0.81).  This  was 
the  only  T2D  risk  marker  typed  in  the  larger  BPC3  data 
set.  To  assess  mediation,  we  used  the  mediation  framework 
proposed  by  Baron  and  Kenny  (44),  extended  into  the  coun- 
terfactual  framework  by  VanderWeele  and  Vansteelandt 
(45)  as  direct  and  indirect  effects,  and  further  generalized 
for  use  with  dichotomous  intermediate  and  outcome.  This 
framework  for  mediation  analysis  is  flexible  to  an  interac¬ 
tion  between  exposure  and  an  intermediate  factor,  has  a 
causal  interpretation,  and  can  assess  mediation  on  both  the 
multiplicative  and  additive  scales.  Assessing  mediation  in 
this  manner  involved  fitting  both  an  outcome  model  and  a 
mediator  model.  The  outcome  model  was  a  logistic  regres¬ 
sion  model  that  modeled  PCa  as  the  outcome,  included  pa¬ 
rameters  for  the  T2D  variant  of  interest  and  diabetes 
phenotype,  and  adjusted  for  potential  confounders  of  the 
exposure-outcome  and  intermediate -outcome  relations,  in¬ 
cluding  cohort  indicator,  age  at  baseline,  and  body  mass 
index  (weight  (kg)/height  (m)2).  The  mediator  model  was  a 
logistic  regression  model  that  modeled  diabetes  phenotype 
as  the  outcome,  included  a  parameter  for  the  T2D  variant 
of  interest,  and  controlled  for  potential  confounders,  includ¬ 
ing  cohort  indicator,  age  at  baseline,  and  body  mass  index. 
In  the  mediator  model,  the  case-control  nature  of  the  BPC3 
needed  to  be  accounted  for  to  obtain  consistent  effect  esti¬ 
mates.  This  was  accomplished  by  fitting  the  model  only  in 
the  PCa  controls,  who  represent  the  study’s  base  popula¬ 
tion,  and  assuming  a  rare  outcome.  Once  both  the  outcome 
and  mediator  models  were  fitted,  parameter  estimates  were 


used  to  calculate  direct  and  indirect  (mediated)  effects  by 
which  to  assess  mediation  (45). 

The  PCa  study  was  conducted  between  May  and  August 
of  2011.  All  statistical  analyses  were  carried  out  in  SAS  9.1 
(SAS  Institute  Inc.,  Cary,  North  Carolina)  and  R  2.11.1  (R 
Foundation  for  Statistical  Computing,  Vienna,  Austria). 

RESULTS 

Results  from  the  individual  association  tests  showed  that 
10  of  the  36  T2D  markers  had  a  P  value  less  than  0.05  for 
association  with  PCa,  significantly  more  than  the  1.8 
markers  that  would  be  expected  by  chance  (P  =  7.5  x  10~6) 
(Table  1).  These  markers  include  the  HNF1B  and  JAZF1 
loci,  as  well  as  NOTCH2,  ADCY5,  CDKN2A/B,  TCF7L2, 
MTNR1B,  FTO,  and  2  independent  loci  at  KCNQ1 
(Table  1).  After  permutation  adjustment  for  multiple  com¬ 
parisons,  only  HNF1B  remained  significant  (adjusted 
P  =  0.001).  Small  fluctuations  in  effect  estimates  of  <3% 
were  observed  when  adjustment  for  diabetes  status  was 
made  in  the  models,  with  overall  conclusions  remaining  the 
same  (results  not  shown).  We  observed  an  inflation  in  the 
observed  P  values  for  these  36  SNPs  (A.gc=  2.0;  Figure  1). 
When  the  observed  Xqc  was  compared  with  the  distribution 
of  permutation  Xqc  values,  the  observed  hx_  was  signifi¬ 
cantly  elevated  (P=  0.03),  which  indicated  that  the  distribu¬ 
tion  of  association  P  values  was  significantly  lower  than 
expected. 

We  used  exact  binomial  tests  to  assess  whether  signifi¬ 
cantly  more  T2D  risk  alleles  were  inversely  associated  with 
PCa  risk  than  would  be  expected  by  chance.  By  chance 
alone,  1.8  of  the  36  markers  would  be  expected  to  be  sig¬ 
nificant,  of  which,  under  the  null,  0.9  would  be  expected  to 
be  significantly  associated  with  increased  risk  of  PCa  and 
0.9  would  be  expected  to  be  significantly  associated  with 
decreased  risk  of  PCa.  In  our  data,  we  observed  2  T2D  loci 
that  were  significantly  associated  with  increased  PCa  risk, 
which  did  not  differ  statistically  from  the  0.9  loci  expected 
by  chance  ( P  =  0.23).  However,  the  8  T2D  loci  we  ob¬ 
served  to  be  significantly  associated  with  reduced  risk  of 
PCa  were  significantly  more  than  the  0.9  that  would  be  ex¬ 
pected  by  chance  ( P  =  2.45  x  1 0-6),  which  indicates  that 
more  T2D  risk  alleles  than  expected  are  associated  with 
reduced  risk  of  PCa. 

Associations  for  GRS  using  both  the  unweighted  count 
and  the  weighted  log  odds  method  are  shown  in  Table  2. 
The  risk  score  for  the  unweighted  count  did  not  show  evi¬ 
dence  for  an  association  of  these  genetic  variants  with  PCa 
risk.  However,  a  significant  association  was  observed  for 
the  weighted  log  odds  method  when  HNF1B  was  both  in¬ 
cluded  in  (P  =  0.002)  and  excluded  from  (P  =  0.015)  the 
GRS.  No  changes  in  results  were  observed  when  we  adjust¬ 
ed  for  diabetes  status  in  the  models  (results  not  shown). 
Study-specific  analyses  showed  that  the  log  odds-weighted 
GRS  was  statistically  significant  only  in  the  Prostate,  Lung, 
Colorectal,  and  Ovarian  Cancer  Screening  Trial,  although 
the  test  for  heterogeneity  indicated  no  significant  departures 
from  homogeneity  (F>  =  0.60). 

The  multilocus  kernel  test  that  jointly  tested  for  a  PCa 
association  with  all  36  T2D  loci  without  specifying  weight 
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Table  1.  Individual  Associations  of  36  Independent  Type  2  Diabetes  Susceptibility  Variants  With  Prostate  Cancer  Risk  in  the  Breast  and 
Prostate  Cancer  Cohort  Consortium3 


Chromosome 

Reported 

Gene(s) 

Single 

Nucleotide 

Polymorphism 

Genotyped?b 

Type  2 
Diabetes 
Risk  Allele 

Frequency 

of 

Risk  Allele 

Odds 

Ratio0 

95%  Confidence 
Interval 

P  Value 

Adjusted 
P  Value 

i 

NOTCH2 

rsl0923931 

No 

T 

0.11 

0.86d 

0.76,  0.96 

0.008* 

0.255 

i 

PROX1 

rs340874 

Yes 

C 

0.52 

1.01 

1.08,  0.94 

0.845 

1.000 

2 

GCKR 

rs780094 

Yes 

C 

0.61 

0.98d 

1.05,  0.91 

0.498 

1.000 

2 

THADA 

rs7578597 

Yes 

T 

0.91 

1.03 

1.16,  0.91 

0.644 

1.000 

2 

BCL11A 

rs243021 

Yes 

A 

0.47 

1.02 

0.95,  1.10 

0.511 

1.000 

2 

IRS1 

rs2943641 

Yes 

C 

0.64 

0.95d 

1.02,  0.88 

0.140 

0.995 

3 

PPARG 

rsl801282 

No 

C 

0.86 

0.96d 

1.07,0.87 

0.465 

1.000 

3 

ADAMTS9 

rs4607103 

No 

c 

0.76 

0.99d 

1.08,  0.91 

0.853 

1.000 

3 

ADCY5 

rs  11708067 

No 

A 

0.78 

0.91d 

0.99,  0.84 

0.028* 

0.630 

3 

1GF2BP2 

rs4402960 

Yes 

T 

0.32 

1.03 

0.95,  1.11 

0.456 

1.000 

4 

WFS1 

rsl0010131 

No 

G 

0.60 

1.00 

1.07,0.93 

0.924 

1.000 

5 

ZBED3 

rs4457053 

No 

G 

0.29 

1.02 

0.94,  1.10 

0.672 

1.000 

6 

CDKALl 

rs7754840 

Yes 

C 

0.32 

1.04 

0.97,  1.13 

0.270 

1.000 

7 

DGKB 

rs2191349 

No 

T 

0.52 

1.00 

1.07,  0.93 

0.945 

1.000 

7 

JAZF1 

rs864745 

No 

T 

0.50 

1.08 

1.16,  1.01 

0.033* 

0.694 

7 

GCK 

rs4607517 

Yes 

A 

0.15 

1.06 

0.96,  1.16 

0.256 

1.000 

7 

KLF14 

rs972283 

No 

G 

0.53 

1.02 

1.09,  0.95 

0.627 

1.000 

8 

TP53INP1 

rs896854 

Yes 

T 

0.51 

1.02 

1.09,  0.95 

0.668 

1.000 

8 

SLC30A8 

rsl3266634 

Yes 

C 

0.68 

1.00 

1.08,  0.93 

0.963 

1.000 

9 

CDKN2A/B 

rs  1081 1661 

No 

T 

0.82 

0.91d 

1.00,  0.83 

0.045* 

0.809 

9 

TLE4 

rsl3292136 

No 

C 

0.93 

0.93d 

1.07,  0.81 

0.312 

1.000 

10 

CDC123/ 

CAMK1D 

rsl2779790 

No 

G 

0.18 

1.06 

0.97,  1.16 

0.206 

1.000 

10 

HHEXZIDE 

rsl  1 1 1875 

Yes 

C 

0.58 

1.01 

1.09,  0.94 

0.713 

1.000 

10 

TCF7L2 

rs7903146 

Yes 

T 

0.28 

0.90d 

0.83,  0.97 

0.009* 

0.276 

11 

KCNQ1 

rs231362 

No 

G 

0.50 

0.92d 

0.86,  0.98 

0.014* 

0.393 

11 

KCNQ1 

rs2237892 

Yes 

C 

0.94 

0.85d 

0.98,  0.74 

0.030* 

0.659 

11 

KCNJ11 

rs5215 

Yes 

T 

0.61 

0.99d 

1.06,  0.92 

0.719 

1.000 

11 

CENTD2 

rsl  5  52224 

Yes 

A 

0.83 

1.00 

1.10,  0.91 

0.963 

1.000 

11 

MTNR1B 

rsl0830963 

No 

G 

0.28 

1.10 

1.01,  1.19 

0.023* 

0.561 

12 

HMGA2 

rsl53 1343 

No 

C 

0.10 

0.98d 

0.88,  1.10 

0.764 

1.000 

12 

TSPAN8/ 

LGR5 

rs7961581 

No 

C 

0.26 

1.05 

0.97,  1.13 

0.259 

1.000 

12 

HNF1A/TCF1 

rs7957197 

No 

T 

0.80 

0.96d 

1.05,  0.88 

0.346 

1.000 

15 

ZFAND6 

rsl  1634397 

No 

G 

0.66 

1.04 

1.12,  0.96 

0.346 

1.000 

15 

PRC1 

rs8042680 

Yes 

A 

0.32 

1.04 

0.97,  1.12 

0.286 

1.000 

16 

FTO 

rs9939609 

No 

A 

0.40 

0.93d 

0.86,  1.00 

0.041* 

0.775 

17 

HNF1B/TCF2 

rs757210 

Yes 

T 

0.35 

0.85d 

0.79,  0.92 

* 

O 

'oj 

m 

o.oor 

Abbreviations:  Cl,  confidence  interval;  OR,  odds  ratio;  T2D,  type  2  diabetes;  RA,  risk  allele;  SNP,  single  nucleotide  polymorphism. 

*  T*<0.05. 

3  Association  tests  were  earned  out  in  the  Breast  and  Prostate  Cancer  Cohort  Consortium  using  a  log-additive  genetic  model  with  adjustment 
made  for  cohort  indicators. 

b  Indicates  whether  or  not  variants  were  genotyped.  Variants  that  were  not  directly  genotyped  were  imputed. 

c  Odds  ratio  for  the  increase  in  prostate  cancer  risk  associated  with  a  1-unit  increase  in  the  number  of  type  2  diabetes  risk  alleles  carried  at 
each  locus. 

d  Association  for  prostate  cancer  was  in  the  inverse  direction. 
e  Significant  after  permutation  correction  for  multiple  testing. 
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Figure  1.  Quantile-quantile  plot  comparing  the  uniformly 
distributed  -logio  P  values  for  the  36  type  2  diabetes  (T2D) 
susceptibility  markers  with  —logio  P  values  observed  in  the  Breast 
and  Prostate  Cancer  Cohort  Consortium  data  set  when  the  authors 
tested  for  an  association  with  prostate  cancer  (PCa)  risk  by  means 
of  a  Wald  test.  The  dotted  line  shows  the  expected  -logio  P  value 
distribution.  The  black  points  represent  observed  P  values  for  the 
association  of  each  T2D  locus  with  PCa  risk.  The  gray  region  is  the 
95%  confidence  interval  for  10,000  permutations.  The  inflation  index 
(A<ic)  of  1.95  is  significantly  elevated  (P  =  0.02),  which  indicates  an 
overall  inflation  in  association  P  values  but  gives  no  information 
about  the  directionality  of  association  between  the  T2D  variants  and 
PCa  risk. 


or  directionality  of  risk  alleles  was  statistically  significant 
(P  =  0.0001).  When  HNF1B  was  removed  from  the  list  of 
included  markers  and  the  remaining  35  markers  were  fitted, 
the  P  value  was  attenuated  but  remained  significant 
(P  =  0.01),  which  indicated  that  a  substantial  portion  of  the 
association  was  a  result  of  the  HNF1B  locus  but  that  other 
T2D  loci  were  associated  with  PCa  as  well. 

We  conducted  mediation  analyses  for  the  HNF1B  locus  to 
investigate  whether  the  locus  had  effects  that  act  directly  on 
PCa  risk  or  whether  the  effects  of  the  locus  were  mediated 
through  diabetes  phenotype  (Table  3).  The  outcome  model 
produced  significant  evidence  for  an  association  between 
HNF1B  and  PCa  risk  (OR  =  0.83,  95%  confidence  interval 
(Cl):  0.79,  0.86;  P  =  6.37  x  10“19)  and  an  association 
between  diabetes  phenotype  and  PCa  risk  (OR  =  0.76,  95% 
Cl:  0.66,  0.87;  P  —  8.13  x  10-5).  The  mediator  model  indicat¬ 
ed  that  the  minor  T  allele  of  rs7501939  was  not  statistically 
significantly  associated  with  an  increased  risk  of  diabetes 
among  the  9,526  PCa  controls  (OR  =  1.10,  95%  Cl:  0.97, 
1.25;  P=  0.14),  although  the  per-allele  odds  ratio  for  associa¬ 
tion  with  T2D  was  consistent  with  previous  reports  (8-10). 
When  these  results  were  combined  together,  the  estimated  direct 
effect  of  HNF1B  on  PCa  risk  was  statistically  significant 
(OR  =  0.83,  95%  Cl:  0.79,  0.86;  P=  1.02  x  10“18),  but  the 
mediated  (indirect)  effect  through  diabetes  phenotype  was 


nonsignificant  (OR  =1.00,  95%  Cl:  1.00,  1.00;  P  =  0.71). 
These  results  are  in  agreement  with  the  standard  mediation 
analysis,  which  produced  an  insignificant  0.5%  change  in  the 
parameter  estimate  for  the  effect  of  HNF1B  when  diabetes 
status  was  included  as  a  covariate. 

DISCUSSION 

Our  study  suggests  that  genetic  variants  associated  with 
T2D  are  also  associated  with  PCa  risk.  Ten  of  36  T2D  sus¬ 
ceptibility  markers  were  nominally  associated  with  PCa 
risk  at  NOTCH2,  ADCY5,  JAZJF1,  CDKN2A/B ,  TCF7L2, 
KCNQ1,  MTNR1B,  FTO,  and  HNF1B,  although  only  the 
HNF1B  locus  remained  significantly  associated  with  PCa 
risk  after  adjustment  for  multiple  testing.  However,  log 
odds  ratio-weighted  GRS  and  kernel  machine  models  also 
were  associated  with  PCa  risk  both  with  and  without  inclu¬ 
sion  of  the  HNF1B  locus,  which  suggests  that  other  genetic 
variants  associated  with  T2D  risk  also  contribute  to  PCa 
risk.  Finally,  mediation  analysis  provided  insufficient  evi¬ 
dence  that  the  association  of  the  HNF1B  locus  with  PCa 
risk  is  mediated  through  diabetes  phenotype. 

Our  study  adds  to  the  evidence  that  a  genetic  background 
favorable  to  the  development  of  T2D  is  associated  with 
PCa  risk.  The  HNF1B  locus  was  most  strongly  associated 
with  PCa  risk  in  this  analysis  and  accounted  for  some  but 
not  all  of  the  association  between  the  T2D  variants  and 
PCa  risk  in  the  GRS  and  the  kernel  regression.  The  noted 
inflation  in  our  association  P  values  for  other  T2D  SNPs  is 
consistent  with  what  others  have  observed  (17,  18)  and  in¬ 
dicates  that  more  germline  variants  are  held  in  common 
between  T2D  and  PCa  than  would  be  expected  by  chance. 

Our  study’s  large  sample  size  and  recently  published 
T2D  susceptibility  loci  permitted  us  to  detect  potentially 
novel  genetic  relations  between  T2D  and  PCa  that  have  not 
been  reported  previously.  Seven  loci  (NOTCH2,  ADCY5, 
CDKN2A/B,  TCF7L2,  KCNQ1 ,  MTNR1B ,  and  FTO)  not 
previously  associated  with  PCa  at  genome-wide  signifi¬ 
cance  levels  were  seen  as  nominally  associated  in  our 
study,  one  of  which  {FTO)  was  also  reported  by  Pierce 
et  al.  (18).  Four  of  these  loci  ( CDKN2A/B ,  TCF7L2, 
KCNQ1,  and  MTNR1B)  are  associated  with  altered  beta 
cell  dysfunction  or  impaired  insulin  release  and  could  result 
in  less  insulin  production,  thus  blunting  insulin  effects  in 
increasing  PCa  risk  (46).  Additionally,  our  second  most 
highly  associated  locus,  the  NOTCH2  locus  (P  =  0.008;  per¬ 
mutation  P  =  0.26),  is  of  interest.  NOTCH2  is  a  member  of 
the  NOTCH  family  of  receptors,  which  modulate  cellular 
differentiation,  proliferation,  and  apoptosis  (47).  The  locus 
has  been  reported  to  be  associated  with  both  T2D  and 
breast  cancer  (48,  49).  Evidence  from  gene  expression  data 
indicates  that  NOTCH2  is  expressed  in  developing  prostate 
stroma  and  that  NOTCH  signaling  affects  stromal  survival 
only  in  the  presence  of  testosterone  (50).  Therefore,  the  reg¬ 
ulatory  ability  of  NOTCH2  and  its  sensitivity  to  the  pres¬ 
ence  of  testosterone  might  be  important  in  prostate 
carcinogenesis,  although  additional  studies  are  needed  to 
investigate  this  further. 

Our  use  of  GRS  and  kernel  machine  models  allowed  us 
to  investigate  the  cumulative  effect  of  T2D  susceptibility 


Am  J Epidemiol.  20 12;  176(1 2):  1121-1129 


Downloaded  from  http://aje.oxfordjoumals.org/  at  Ernst  Mayr  Library  of  the  Museum  Comp  Zoology,  Harvard  University  on  February  8,  2015 


1126  Machiela  et  al. 


Table  2.  Individual  Cohort  and  Combined  Results  for  Unweighted  and  Log  Odds  Ratio-Weighted  Type  2  Diabetes  Genetic  Risk  Score  in  the 
Breast  and  Prostate  Cancer  Cohort  Consortium" 


Cohort  No.  of  l  Mean"  GRS  GRS  (-HNFIBf 


Cohort 

Cases 

Total 

Cases 

Controls 

OR 

95%  Cl 

P  Value 

OR 

95%CI 

P  Value 

Unweighted  count 

ATBC 

1,490 

245 

72 

36.48 

36.44 

1.00 

0.97,  1.04 

0.894 

1.00 

0.97,  1.04 

0.841 

CPSII 

1,258 

636 

72 

37.48 

37.55 

1.00 

0.97,  1.03 

0.740 

1.00 

0.97,  1.03 

0.839 

EPIC 

857 

431 

72 

37.47 

37.66 

0.99 

0.95,  1.02 

0.460 

1.00 

0.96,  1.04 

0.984 

HPFS 

418 

214 

72 

37.70 

37.47 

1.02 

0.97,  1.07 

0.539 

1.02 

0.97,  1.08 

0.419 

MEC 

503 

244 

72 

37.80 

37.89 

0.99 

0.95,  1.04 

0.779 

1.00 

0.96,  1.05 

0.936 

PHS 

553 

298 

72 

37.59 

37.81 

0.99 

0.95,  1.03 

0.521 

1.00 

0.95,  1.04 

0.800 

PLCO 

2,161 

714 

72 

37.36 

37.64 

0.98 

0.96,  1.00 

0.111 

0.98 

0.96,  1.01 

0.191 

Combined6 

7,240 

2,782 

72 

37.42 

37.31 

0.99 

0.98,  1.00 

0.168 

1.00 

0.98,  1.01 

0.534 

Weighted  log  OR 

ATBC 

1,490 

245 

8.16 

4.33 

4.34 

0.93 

0.68,  1.29 

0.675 

0.94 

0.68,  1.30 

0.718 

CPSII 

1,258 

636 

8.16 

4.45 

4.47 

0.89 

0.69,  1.14 

0.358 

0.90 

0.70,  1.16 

0.416 

EPIC 

857 

431 

8.16 

4.45 

4.47 

0.90 

0.67,  1.20 

0.460 

1.01 

0.75,  1.36 

0.961 

HPFS 

418 

214 

8.16 

4.49 

4.46 

1.11 

0.73,  1.68 

0.635 

1.17 

0.76,  1.80 

0.481 

MEC 

503 

244 

8.16 

4.49 

4.54 

0.78 

0.53,  1.15 

0.215 

0.83 

0.56,  1.23 

0.352 

PHS 

553 

298 

8.16 

4.45 

4.52 

0.76 

0.53,  1.07 

0.118 

0.80 

0.56,  1.15 

0.232 

PLCO 

2,161 

714 

8.16 

4.43 

4.49 

0.74 

0.61,  0.91 

0.004 

0.76 

0.62,  0.93 

0.008 

Combined6 

7,240 

2,782 

8.16 

4.44 

4.45 

0.84 

0.75,0.94 

0.002 

0.87 

0.78,  0.97 

0.015 

Abbreviations:  ATBC,  Alpha-Tocopherol,  Beta-Carotene  Cancer  Prevention  Study;  Cl,  confidence  interval;  CPSII,  American  Cancer  Society 
Cancer  Prevention  Study  II  Nutrition  Cohort;  EPIC,  European  Prospective  Investigation  into  Cancer  and  Nutrition;  GRS,  genetic  risk  score; 
HPFS,  Health  Professionals  Follow-up  Study;  MEC,  Multiethnic  Cohort  Study;  OR,  odds  ratio;  PCa,  prostate  cancer;  PHS,  Physicians’  Health 
Study;  PLCO,  Prostate,  Lung,  Colorectal,  and  Ovarian  Cancer  Screening  Trial;  T2D,  type  2  diabetes. 
a  Logistic  regression  models  were  used  to  regress  GRS  on  risk  of  PCa. 

h  Total  indicates  the  maximum  bound  for  the  respective  GRS,  with  a  value  close  to  this  total  indicating  high  genetic  predisposition  for  T2D. 
c  Mean  GRS  was  calculated  for  PCa  cases  and  PCa  controls. 
d  HNF1B  was  excluded  from  the  GRS  and  included  as  a  separate  covariate. 

°  For  combined  estimates,  cohort  indicators  were  added  to  adjust  for  cohort  effects. 


variants  on  PCa  risk.  Although  another  study  was  success¬ 
ful  in  showing  an  association  between  unweighted  T2D 

Table  3 .  Mediation  Analysis  for  the  Association  Between  HNF1B 
(rs7501939)  and  Prostate  Cancer  With  Diabetes  Phenotype  as  a 
Potential  Intennediate  in  the  Breast  and  Prostate  Cancer  Cohort 
Consortium" 


Odds 

Ratio 

95% 

Confidence 

Interval 

P  Value 

HNF1B-T2D 

association 

1.10 

0.97,  1.25 

0.14 

T2D-prostate  cancer 
association 

0.76 

0.66,  0.87 

8.13  x  10“°5 

Natural  indirect  effect 

1.00 

1.00,  1.00 

0.71 

Natural  direct  effect 

0.83 

0.79,0.86 

1.02  x  10  18 

Total  effect 

0.83 

0.79,0.86 

6.37  x  10  19 

Abbreviation:  T2D,  type  2  diabetes. 

a  All  analyses  were  conducted  in  the  Breast  and  Prostate  Cancer 
Cohort  Consortium  and  were  adjusted  for  cohort  indicator,  age  at 
baseline  (years),  and  body  mass  index  (weight  (kg)/height  (m)2). 


GRS  and  PCa  (18),  our  study  did  not  find  a  relation 
between  unweighted  T2D  risk  scores  and  PCa.  A  potential 
explanation  for  our  lack  of  association  is  that  with  the  most 
recent  T2D  loci  added  to  our  risk  score,  including  T2D  var¬ 
iants  found  through  meta-analyses  with  lower-than-average 
effect  sizes,  the  number  of  SNPs  doubled,  and  the  range  of 
effect  estimates  for  each  variant  might  have  widened.  Our 
study  did  find  a  significant  association  between  the  log 
odds-weighted  T2D  risk  scores  and  PCa.  This  association 
was  significant  when  the  HNF1B  locus  was  both  included 
in  and  excluded  from  the  GRS.  Although  one  of  the  larger 
cohorts,  the  Prostate,  Lung,  Colorectal,  and  Ovarian  Cancer 
Screening  Trial,  seems  to  have  been  responsible  for  most  of 
this  association,  a  test  of  heterogeneity  indicated  that  there 
was  no  significant  evidence  for  heterogeneity.  The  fact  that 
the  log  odds  ratio-weighted  GRS  was  significant  and  the 
unweighted  risk  score  was  insignificant  indicates  that  some 
T2D  variants  could  have  a  stronger  influence  on  PCa  risk 
than  others.  The  GRS  approach  makes  the  assumption  that 
all  T2D  loci  included  in  the  GRS  have  T2D  risk  alleles  that 
function  in  the  same  direction  when  PCa  risk  is  considered. 
This  might  not  be  the  case,  with  some  T2D-associated  loci 
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possibly  having  the  same  rather  than  the  (expected)  oppo¬ 
site  direction  of  effect  on  PCa.  Multilocus  kernel  tests 
allowed  us  to  assess  the  cumulative  effect  of  these  36  T2D 
variants  on  PCa  risk  without  requiring  an  assumption  about 
risk  allele  directionality.  Results  from  the  multilocus  kernel 
tests  indicated  that  the  36  T2D  variants  were  significantly 
associated  with  PCa  risk  when  HNF1B  was  both  inclu¬ 
ded  in  and  excluded  from  the  models,  which  suggests 
that  common  pathways  could  be  involved  in  both  T2D  and 
PCa. 

A  potential  limitation  of  this  study  is  that  information  on 
diabetes  phenotype  was  self-reported  (43).  However,  previ¬ 
ous  studies  have  shown  that  self-reporting  of  diabetes  has 
up  to  97%  agreement  with  medical  records  (51,  52). 
Another  limitation  is  that  we  could  not  differentiate 
between  cases  of  type  1  diabetes  and  T2D,  although  the 
median  age  (62  years;  interquartile  range,  55-70)  and  eth¬ 
nicity  of  our  study  population  were  such  that  the  majority 
of  diabetes  cases  were  likely  to  be  T2D  (53).  Furthermore, 
BPC3  data  on  T2D  status  were  available  only  at  baseline, 
and  although  this  could  have  resulted  in  underestimation  of 
the  true  prevalence  of  diabetes  in  our  study  population,  it 
did  guard  against  potential  reverse  causality. 

Our  study  showed  a  highly  significant  inverse  relation 
between  T2D  and  PCa.  The  estimate  was  adjusted  for  body 
mass  index,  age  at  baseline,  and  cohort  indicator  and  is  un¬ 
likely  to  be  due  to  chance  or  uncontrolled  bias.  To  our 
knowledge,  this  is  the  largest  case-control  study  in  which 
this  inverse  association  has  been  examined,  and  our 
estimate  (OR  =  0.76)  is  comparable  to,  albeit  slightly  stron¬ 
ger  than,  the  point  estimates  reported  in  meta-analyses 
and  other  studies,  including  prior  reports  from  2  cohorts 
in  the  BPC3  (i.e.,  relative  risks  ranged  from  0.84  to  0.91) 
(3-5,  54). 

We  further  assessed  the  potential  for  T2D  phenotype  to 
mediate  the  effect  of  HNF1B  with  PCa  risk.  Results  indi¬ 
cated  a  highly  significant  direct  association  between 
HNF1B  and  PCa  risk,  but  there  was  no  significant  evidence 
for  an  indirect  association.  Although  other  investigators 
have  observed  a  significant  relation  between  HNF1B  and 
T2D  risk  (8,  9),  we  did  not,  which  indicates  that  our 
sample  set  might  have  lacked  sufficient  statistical  power  to 
detect  this  effect.  The  lack  of  a  mediation  role  for  diabetes 
phenotype  in  the  HNFIB-PCa  association  has  been  report¬ 
ed  elsewhere  in  a  smaller  subset  of  the  BPC3  data  (16), 
although  larger  studies  are  needed  to  more  definitively  rule 
out  the  potential  for  mediation. 

The  majority  of  our  analysis,  excluding  the  mediation 
analysis,  was  conducted  on  data  from  a  genome- wide  asso¬ 
ciation  study  of  advanced  PCa.  Although  there  is  concern 
that  results  from  our  study  might  not  be  generalizable  to 
other  subtypes  of  PCa,  the  overwhelming  number  of  simi¬ 
larities  between  our  analysis  and  others  indicates  that  T2D 
risk  variants  have  a  similar  effect  on  advanced  PCa  risk 
and  on  total  PCa  risk.  This  is  in  agreement  with  association 
studies  comparing  PCa  germline  variants  that  show  very 
few  examples  of  different  effects  by  disease  aggressiveness. 

In  conclusion,  our  data  provide  additional  evidence  for  a 
relation  between  T2D  and  PCa.  Current  investigations  of  a 
shared  genetic  background  that  could  underlie  this  observed 


association  are  still  in  their  infancy  but  suggest  that  a 
genetic  predisposition  to  T2D  might  also  be  associated 
with  PCa  risk.  Future  studies  should  further  investigate  the 
potential  genetic  factors  that  link  these  two  common 
chronic  diseases. 
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